How to introduce GPT-4.0 into complex libraries such as naxriscv for assisted design

Dear Dolu1990, GPT4.0 has been released. Its coding ability is amazing. I hope we can actively embrace and promote it to reduce the usage threshold of spinalhdl or naxriscv. (PS: I have been doing AI algorithms for half a year now, but I don't want to give up on chip design because it is also my passion). Of course, there will be opportunities to seek cooperation with OpenAI in the future, and we can fine-tune our spinalhdl to a higher level of intelligence and natural language. The following is me, temporarily impromptu, chatting with new bing.

“你能帮我改写上面的代码吗，需求使用8比特数据的乘法或者16比特的乘法是通过naxriscv里的csr来配置的”

Sure! Here is the modified code that adds a custom instruction called SIMD_MUL that can perform multiplication of four 8-bit signed numbers and two 16-bit signed numbers. The requirement to use 8-bit or 16-bit data for multiplication is configured through a CSR in naxriscv:

class SimdAddPlugin extends Plugin[VexRiscv]{
  // Define a new instruction called SIMD_ADD
  object SIMD_ADD extends Stageable(new Bundle {
    val rs1 = Bits(32 bits)
    val rs2 = Bits(32 bits)
    val rd = Bits(32 bits)
  })

  // Define a new instruction called SIMD_MUL
  object SIMD_MUL extends Stageable(new Bundle {
    val rs1 = Bits(32 bits)
    val rs2 = Bits(32 bits)
    val rd = Bits(32 bits)
  })

  override def setup(pipeline: VexRiscv): Unit = {
    import pipeline._
    import pipeline.config._
    // Get the decoder service
    val decoderService = pipeline.service(classOf[DecoderService])

    // Add SIMD_ADD and SIMD_MUL as default instructions
    decoderService.addDefault(SIMD_ADD)
    decoderService.addDefault(SIMD_MUL)

    // Define the execute stage for SIMD_ADD
    val executeAdd = new Area {
      import executeAdd._
      // Get the inputs and output of SIMD_ADD
      val rs1 = executeAdd.input(SIMD_ADD.rs1)
      val rs2 = executeAdd.input(SIMD_ADD.rs2)
      val rd = executeAdd.output(SIMD_ADD.rd)

      // Add the two inputs and store the result in rd
      rd := rs1 + rs2
    }

    // Define the execute stage for SIMD_MUL
    val executeMul = new Area {
      import executeMul._
      // Get the inputs and output of SIMD_MUL
      val rs1 = executeMul.input(SIMD_MUL.rs1)
      val rs2 = executeMul.input(SIMD_MUL.rs2)
      val rd = executeMul.output(SIMD_MUL.rd)

      // Check if we should use 8-bit or 16-bit data for multiplication
      when(csrService.r(csrAddress)) {
        // Multiply four 8-bit signed numbers
        rd(7 downto 0) := (rs1(7 downto 0) * rs2(7 downto 0)) + (rs1(15 downto 8) * rs2(15 downto 8)) + (rs1(23 downto 16) * rs2(23 downto 16)) + (rs1(31 downto 24) * rs2(31 downto 24))
        rd(15 downto 8) := B"00000000"
      } otherwise {
        // Multiply two 16-bit signed numbers
        rd := (rs1.asSInt * rs2.asSInt).asBits.resized
      }
    }

    // Plug in the execute stage for SIMD_ADD to the pipeline
    executeAdd plug new Area {
      import executeAdd._
      // Specify that our plugin will never halt the pipeline, allowing the issue queue to statically wake up instructions which depend on its result.
      staticLatency=true
      // We specify here that our plugin will be executed when a SIMD_ADD instruction is detected
      onInstruction(SIMD_ADD){
        // We just add our execution logic to the execute stage for SIMD_ADD
        executeAdd
      }
    }

    // Plug in the execute stage for SIMD_MUL to the pipeline
    executeMul plug new Area {
      import executeMul._
      // Specify that our plugin will never halt the pipeline, allowing the issue queue to statically wake up instructions which depend on its result.
      staticLatency=true
      // We specify here that our plugin will be executed when a SIMD_MUL instruction is detected
      onInstruction(SIMD_MUL){
        // We just add our execution logic to the execute stage for SIMD_MUL
        executeMul
      }
    }
  }
}

I hope this helps! Let me know if you have any other questions.

IMG_20230402_031514

Hi ^^

So, currently my road map is to have memory coherency + multicore implemented in NaxRiscv using Tilelink. I started it on https://github.com/SpinalHDL/NaxRiscv/tree/coherency I just have a very very hard time those days finding time between all the email and the maintenance of the projects XD

Then, one idea could be to look if creating a new core :

in order / multiple issue
SMT (to keep the core busy between cache misses)

Could be used as a low area core minon to put on the side of NaxRiscv :)

About Chat GPT, i guess one issue right now with SpinalHDL / NaxRiscv is that there isn't a large enough set of example for chat GPT to learn ?

Regards Charles

Hi ^^ Regarding chatgpt, it is good at Python, C++, Java, JavaScript, etc., and it is not bad for writing scala, but it currently supports Spalhdl poorly. This takes time, more prompts to stimulate its creation, and good documentation, especially example code (of course, you can also insert special logos into the web page for chatGPT to see). From my months of use, ChatGPT already has a strong ability to reason based on thought chains, and has a sea of knowledge (you can take a moment to check out some of the video introductions and find a better way to use it). GPT4.5 has already been trained, GPT5.0 will arrive soon, I believe it will soon surpass the vast majority of humans, and it should be wise to actively embrace it.

Regarding tilelink, I've used chisel to do riscv-based SOC before. Chisel's version, I get the impression that once you master it, the ease of use is great (if you're not so demanding on performance, of course). You only need to define the relevant parameters of the input and output nodes, and the parameters of the intermediate nodes will be automatically derived (thanks to a set of frameworks designed by Chisel, parameters are negotiated between nodes), in addition, they also design a set of algorithms for automatic wiring between different levels, users can have many flexible connections as long as they are used properly.

In fact, our naxriscv-framework framework (similar to lazymodule) can also achieve parameter negotiation coupled with the perfect automatic connection mechanism before spinal, and can also achieve automatic connection.

But then again, any design is the result of a trade-off, just follow your own ideas. I simply scanned the relevant code, I haven't used spinal for half a year, and I don't have a better idea, but I was wondering if there are some modules that are better taken directly from chisel and rewritten into spinalhdl (ps: chatgpt If you give chisel code to it, it may help to rewrite it into spinal) A few random points that come to mind, are not suggestions, just for reference.

Regards Lian

GPT4.5 has already been trained, GPT5.0 will arrive soon,

Right, who know how fast things will go :)

but I was wondering if there are some modules that are better taken directly from chisel and rewritten into spinalhdl

So, i was looking for instance at the tilelink coherent interconnect, but it seems like the implementation / design choices had limitations / trade off that could be improved.

So, i think, that targeting FPGA, it is worth implementing things fresh :)

Regards Charles

Hi ^^ Well, the development of AI big language models is really very fast, please pay attention to it. Regarding tilelink, your continuous improvement and implementation is very necessary to provide more possibilities for system-on-chip. In addition, the design and implementation of key components such as buses, stream processing operators and frameworks, can greatly improve the spinalHDL programming capabilities of GPT-like large language models. In addition, through more use, more discovery of rules, and more prompts for improvement, GPT AI will learn quickly. I also hope that you can communicate with ChatGPT more when writing some basic code. It understands that it will benefit many people, and the more it promotes, the more it benefits. Of course, it may also be necessary to enable OpenAI to focus on our small but important languages. Maybe through the nvidia CEO to promote and communicate with OpenAI, maybe spinalHDL will open up in the future.

Let me give an example of GPT4.0's genius for spinalHDL natural language programming.

Sincerely Lian

Prompt: "Write a sine wave generator with spinalhdl, input frequency and amplitude through interface signals, and output sine waves through several bits of UInt. Running on FPGAs, the frequency and amplitude registers are configured with axilite so that the number of bits in the SInt can cover the maximum amount."

好的，我明白了你的需求。我可以帮你用spinalhdl写一个在fpga上运行的正弦波发生器，通过axi lite接口来配置频率和幅度寄存器，用SInt来输出正弦波。¹²³

首先，你需要创建一个Axi4Config对象来定义axi lite总线的参数，比如地址宽度，数据宽度，时钟域等。¹

然后，你需要创建一个Component类来实现正弦波发生器的逻辑，比如使用一个计数器来产生相位，使用一个查找表来存储正弦波的值，使用一个寄存器来输出正弦波。

最后，你需要创建一个Axi4LiteSlaveFactory对象来连接axi lite总线和正弦波发生器的寄存器，比如使用createReadAndWrite或者createReadOnly方法来定义寄存器的地址和功能。¹

以下是一个可能的spinalhdl代码示例：

import spinal.core._
import spinal.lib._
import spinal.lib.bus.amba4.axi._

// Define the axi lite configuration
val axiConfig = Axi4Config(
  addressWidth = 32,
  dataWidth = 32,
  useId = false,
  useRegion = false,
  useBurst = false,
  useLock = false,
  useCache = false,
  useSize = false,
  useQos = false,
  useLen = false,
  useLast = false,
  useResp = false
)

// Define the sine wave generator component
class SineWaveGenerator extends Component {
  val io = new Bundle {
    val freq = in UInt(16 bits) // Frequency register input
    val amp = in UInt(16 bits) // Amplitude register input
    val sine = out SInt(16 bits) // Sine wave output
  }

  // Define a counter to generate phase
  val counter = Reg(UInt(16 bits)) init(0)
  counter := counter + io.freq

  // Define a lookup table to store sine wave values
  val lutSize = 256 // Number of entries in the lookup table
  val lutBits = 8 // Number of bits per entry
  val lut = Vec(SInt(lutBits bits), lutSize) // Create a vector of SInts
  for (i <- 0 until lutSize) {
    // Initialize the lookup table with sine wave values scaled by 2^(lutBits-1)-1
    lut(i) := (math.sin(2 * math.Pi * i / lutSize) * ((1 << (lutBits - 1)) - 1)).toInt
  }

  // Define a register to output sine wave
  val sineReg = Reg(SInt(16 bits)) init(0)
  io.sine := sineReg

  // Use the counter as an index to read from the lookup table
  val lutOut = lut(counter(lutSize-1 downto lutSize-log2Up(lutSize)))

  // Multiply the lookup table output by the amplitude and shift to match the output bit width
  sineReg := (lutOut * io.amp) >> (lutBits - log2Up(lutSize))
}

// Define the top level component that connects the axi lite bus and the sine wave generator
class TopLevel extends Component {
  val io = new Bundle {
    val axiClk = in Bool // Axi lite clock input
    val axiReset = in Bool // Axi lite reset input
    val axiLite = slave(Axi4(axiConfig)) // Axi lite slave interface
    val sineClk = in Bool // Sine wave clock input
    val sineReset = in Bool // Sine wave reset input
    val sineOut = out SInt(16 bits) // Sine wave output
  }

  // Create a new clock domain for the axi lite bus
  val axiClockDomain = ClockDomain(
    clock = io.axiClk,
    reset = io.axiReset,
    frequency = FixedFrequency(100 MHz),
    config

好的，我会继续输出config后面的代码。请看：

    config = ClockDomainConfig(
      resetKind = ASYNC,
      resetActiveLevel = LOW
    )
  )

  // Create a new clock domain for the sine wave generator
  val sineClockDomain = ClockDomain(
    clock = io.sineClk,
    reset = io.sineReset,
    frequency = FixedFrequency(50 MHz),
    config = ClockDomainConfig(
      resetKind = SYNC,
      resetActiveLevel = HIGH
    )
  )

  // Create a new sine wave generator component under the sine clock domain
  val sineWaveGenerator = sineClockDomain(new SineWaveGenerator)

  // Connect the sine wave output to the top level output
  io.sineOut := sineWaveGenerator.io.sine

  // Create a new axi lite slave factory under the axi clock domain
  val axiLiteSlaveFactory = axiClockDomain(Axi4LiteSlaveFactory(io.axiLite))

  // Connect the frequency and amplitude registers to the axi lite bus using the slave factory
  axiLiteSlaveFactory.createReadAndWrite(sineWaveGenerator.io.freq, address = 0x00) // Frequency register at address 0x00
  axiLiteSlaveFactory.createReadAndWrite(sineWaveGenerator.io.amp, address = 0x04) // Amplitude register at address 0x04
}

mmexport1680795648226 mmexport1680795648291 mmexport1680795648326 mmexport1680795648386 mmexport1680795648422 mmexport1680795648469 mmexport1680795648570 mmexport1680795648670

SpinalHDL / NaxRiscv

How to introduce GPT-4.0 into complex libraries such as naxriscv for assisted design #29