chipsalliance / rocket-chip

Rocket Chip Generator
Other
3.28k stars 1.14k forks source link

The Chisel sha3 implementation CANN'T run correctly. #787

Closed yidanyiji closed 7 years ago

yidanyiji commented 7 years ago

I had transplanted the Chisel implementation of sha3 to the current Rocketchip as following:

1, git rocktchip and toolchain, compile them as README.md 2, git rocc-template 3, add Sha3Accel code in project-template/rocket-chip/src/main/scala/rocket/rocc.scala

case object WidthP extends Field[Int]
case object Stages extends Field[Int]
case object FastMem extends Field[Boolean]
case object BufferSram extends Field[Boolean]

abstract class SimpleRoCC()(implicit p: Parameters) extends RoCC()(p)
{
  io.interrupt := Bool(false)
    //Set this true to trigger an interrupt on the processor (please refer to supervisor documentation)

  //a simple accelerator doesn't use imem or page tables

  //Old Format
  //io.imem.acquire.valid := Bool(false)
  //io.imem.grant.ready := Bool(false)
  //io.imem.finish.valid := Bool(false)
  //io.iptw.req.valid := Bool(false)
  //io.dptw.req.valid := Bool(false)
  //io.pptw.req.valid := Bool(false)

  //New Format
  io.autl.acquire.valid := Bool(false)
  io.autl.grant.ready := Bool(false)
  for(i <- 0 until p(RoccNPTWPorts)) io.ptw(i).req.valid := Bool(false)
}

class Sha3Accel()(implicit p: Parameters) extends SimpleRoCC()(p) {
  //parameters
  val W = p(WidthP)
  val S = p(Stages)
  //constants
  val r = 2*256
  val c = 25*W - r
  val round_size_words = c/W
  val rounds = 24 //12 + 2l
  val hash_size_words = 256/W
  val bytes_per_word = W/8

  //RoCC Interface defined in testMems.scala
  //cmd
  //resp
  io.resp.valid := Bool(false) //Sha3 never returns values with the resp
  //mem
  //busy

  val ctrl = Module(new CtrlModule(W,S)(p))

  ctrl.io.rocc_req_val   <> io.cmd.valid
  //ctrl.io.rocc_req_rdy   <> io.cmd.ready
  io.cmd.ready <> ctrl.io.rocc_req_rdy
  ctrl.io.rocc_funct     <> io.cmd.bits.inst.funct
  ctrl.io.rocc_rs1       <> io.cmd.bits.rs1
  ctrl.io.rocc_rs2       <> io.cmd.bits.rs2
  ctrl.io.rocc_rd        <> io.cmd.bits.inst.rd
  //ctrl.io.busy           <> io.busy
  io.busy <> ctrl.io.busy

  io.mem.req.valid <> ctrl.io.dmem_req_val
  ctrl.io.dmem_req_rdy   <> io.mem.req.ready
  io.mem.req.bits.tag <> ctrl.io.dmem_req_tag  
  io.mem.req.bits.addr <> ctrl.io.dmem_req_addr
  io.mem.req.bits.cmd <> ctrl.io.dmem_req_cmd
  io.mem.req.bits.typ <> ctrl.io.dmem_req_typ

  ctrl.io.dmem_resp_val  <> io.mem.resp.valid
  ctrl.io.dmem_resp_tag  <> io.mem.resp.bits.tag
  ctrl.io.dmem_resp_data := io.mem.resp.bits.data

  val dpath = Module(new DpathModule(W,S))

  dpath.io.message_in <> ctrl.io.buffer_out
  io.mem.req.bits.data := dpath.io.hash_out(ctrl.io.windex)

  //ctrl.io <> dpath.io
 dpath.io.absorb <> ctrl.io.absorb 
 dpath.io.init <>  ctrl.io.init 
 dpath.io.write <>  ctrl.io.write 
dpath.io.round <>  ctrl.io.round 
 dpath.io.stage <>  ctrl.io.stage
 dpath.io.aindex <>  ctrl.io.aindex

}

4, add diractory project-template/rocket-chip/src/main/scala/sha3, and copy sha3 files including chi.scala, constants.scala, dpath.scala, rhopi.scala, common.scala, ctrl.scala, iota.scala, theta.scala from rocc-template/src/main/scala/ and modify those files to fit the new chisel3 syntax.

5, add Sha3Config code in project-template/rocket-chip/src/main/scala/coreplex/Config.scala as following:

class Sha3Config extends Config{ 
  override val topDefinitions:World.TopDefs = {
    (pname,site,here) => pname match {
      case WidthP => 64
      case Stages => Knob("stages")
      case FastMem => Knob("fast_mem")
      case BufferSram => Dump(Knob("buffer_sram"))
      case RoccMaxTaggedMemXacts => 32
      case BuildRoCC => Seq( 
                          RoccParameters(    
                            opcodes = OpcodeSet.custom0,
                            generator = (p: Parameters) => (Module(new Sha3Accel()(p)))) )
    }
  }

  override val topConstraints:List[ViewSym=>Ex[Boolean]] = List(
    ex => ex(WidthP) === 64,
    ex => ex(Stages) >= 1 && ex(Stages) <= 4 && (ex(Stages)%2 === 0 || ex(Stages) === 1),
    ex => ex(FastMem) === ex(FastMem),
    ex => ex(BufferSram) === ex(BufferSram)
    //ex => ex[Boolean]("multi_vt") === ex[Boolean]("multi_vt")
  )
  override val knobValues:Any=>Any = {
    case "stages" => 1
    case "fast_mem" => true
    case "buffer_sram" => false
    case "multi_vt" => true
  }
}

6, add Sha3CPPConfig code in project-template/rocket-chip/src/main/scala/rocketchip/Config.scala class Sha3CPPConfig extends Config(new Sha3Config ++ new BaseConfig)

7, goto project-template/rocket-chip/emulator and make CONFIG=Sha3CPPConfig and get the emulator: emulator-rocketchip-Sha3CPPConfig

8, change the sha3 test code:rocc-template/tests/sha3-rocc.c as following

#include <stdio.h>
#include "sha3.h"

#define STR1(x) #x
#define STR(x) STR1(x)
#define EXTRACT(a, size, offset) (((~(~0 << size) << offset) & a) >> offset)

#define CUSTOMX_OPCODE(x) CUSTOM_##x
#define CUSTOM_0 0b0001011
#define CUSTOM_1 0b0101011
#define CUSTOM_2 0b1011011
#define CUSTOM_3 0b1111011

#define CUSTOMX(X, rd, rs1, rs2, funct) \
  CUSTOMX_OPCODE(X)                   | \
  (rd                   << (7))       | \
  (0x7                  << (7+5))     | \
  (rs1                  << (7+5+3))   | \
  (rs2                  << (7+5+3+5)) | \
  (EXTRACT(funct, 7, 0) << (7+5+3+5+5))

#define CUSTOMX_R_R_R(X, rd, rs1, rs2, funct)           \
  asm volatile ("mv a4, %[_rs1]\n\t"                             \
       "mv a5, %[_rs2]\n\t"                             \
       ".word "STR(CUSTOMX(X, 15, 14, 15, funct))"\n\t" \
       "mv %[_rd], a5"                                  \
       : [_rd] "=r" (rd)                                \
       : [_rs1] "r" (rs1), [_rs2] "r" (rs2)             \
       : "a4", "a5");

int main() {

  do {
    printf("start basic test 1.\n");
    // BASIC TEST 1 - 150 zero bytes

    // Setup some test data
    int i = 0;
    unsigned int ilen = 150;
    unsigned char input[150] = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000";
    unsigned char output[SHA3_256_DIGEST_SIZE];
    unsigned char *_output=output;

    asm volatile ("fence");
    // Invoke the acclerator and check responses

    // setup accelerator with addresses of input and output
    //              opcode rd rs1          rs2          funct   
    //    asm volatile ("custom0 0, %[msg_addr], %[hash_addr], 0" : : [msg_addr]"r"(&input), [hash_addr]"r"(&output));
CUSTOMX_R_R_R(0,_output,input,output,0);

   // Set length and compute hash
    //              opcode rd rs1      rs2 funct   
   //    asm volatile ("custom0 0, %[length], 0, 1" : : [length]"r"(ilen));
ROCC_INSTRUCTION_(0, _output, ilen, 0, 1);
printf("SECOND end!\n");

    // Check result
    unsigned char result[SHA3_256_DIGEST_SIZE] =    {221,204,157,217,67,211,86,31,54,168,44,245,97,194,193,26,234,42,135,166,66,134,39,174,184,61,3,149,137,42,57,238};
    //sha3ONE(input, ilen, result);
    for(i = 0; i < SHA3_256_DIGEST_SIZE; i++){
      printf("output[%d]:%d ==? results[%d]:%d \n",i,output[i],i,result[i]);
      assert(output[i]==result[i]);
    }
  }while(0);

  printf("success!\n");
  return 0;
}

9, at last, I run emulator: ./emulator-rocketchip-Sha3CPPConfig sha3.riscv #THE EMULATOR NEVER QUIT!!!

10, then I run emulator: ./emulator-rocketchip-Sha3CPPConfig sha3.riscv +verbose I find the emulator read the first custom inst: CUSTOMX_R_R_R(0,_output,input,output,0) ### but NEVER read the second custom inst.

11, then I add test code in project-template/rocket-chip/src/main/scala/sha3/ctrl.scala

  switch(rocc_s) {
  is(r_idle) {
    io.rocc_req_rdy := !busy
    when(io.rocc_req_val && !busy){
      when(io.rocc_funct === UInt(0)){
        io.rocc_req_rdy := Bool(true)
        msg_addr  := io.rocc_rs1
        hash_addr := io.rocc_rs2
        println("Msg Addr: "+msg_addr+", Hash Addr: "+hash_addr)
        io.busy := Bool(true)
//----------------------------test
printf("msg_addr=%x, hash_addr=%x\n",msg_addr.toUInt,hash_addr.toUInt)
//----------------------------test end
      }

I find that # the values of msg_addr and hash_addr are wrong : msg_addr=40, hash_addr=80023b88

Where does my code's fault happen? Did Anybody run the chisel sha3 implementation correctly?

Thank you.

colinschmidt commented 7 years ago

I think the biggest problem here is that those custom macros presumably found here don't support not setting the xd field in RoCC. So rocket will always wait for the response which the SHA3 accelerator will never return. I don't believe these macros fundamentally can't be fixed to allow you to set xd, xs1, and xs2 but I don't plan to do that in the near future.

A couple side notes I thought about while reading the issue: If the values the accelerator got were wrong you should check that the code you compiled is assembling to the correct instruction. It seems pretty easy to believe that with those very different addresses the accelerator could go off the rails.

I'm also somewhat confused by your code snippets, you seem to have commented out some important things. If you use ``` to enclose your code it will be much easier to read.

And finally I'm updating the RoCC interface for a modern rocket-chip now and I've gotten several requests for an updated SHA3 accelerator so I might also get around to that soon.

yidanyiji commented 7 years ago

Thanks, @colinschmidt It seems pretty easy to believe that with those very different addresses the accelerator could go off the rails. That is right. How to assemble to the correct instuction? please give me some proposal, thank you a lot!!!

yidanyiji commented 7 years ago

you are right!!! The modified code is as following:

#define CUSTOMX(X, rd, rs1, rs2, funct) \
  CUSTOMX_OPCODE(X)                   | \
  (rd                   << (7))       | \
  (0x3                  << (7+5))     | \  // 7 -> 3
  (rs1                  << (7+5+3))   | \
  (rs2                  << (7+5+3+5)) | \
  (EXTRACT(funct, 7, 0) << (7+5+3+5+5))
seldridge commented 7 years ago

Really, I should have been setting xd semi-automatically all along. This should be the correct macro which makes xd == (rd != x0).

Source:

  ROCC_INSTRUCTION_RAW_R_R_R(0, 0, 0, 0, 0) ;
  ROCC_INSTRUCTION_RAW_R_R_R(0, 1, 0, 0, 0) ;

Binary:

    800000ec:   0000300b            0x300b // xd=0, xs1=1, xs2=1, rd=0, opcode=0b
    800000f0:   0000708b            0x708b // xd=1, xs1=1, xs2=1, rd=1, opcode=0b
yidanyiji commented 7 years ago

thanks @seldridge