chili-chips-ba / openCologne

Spicing up the first and only EU FPGA chip with a flashy new board, loaded with a suite of engaging demos and examples. https://www.chili-chips.xyz/open-cologne
https://nlnet.nl/project/openCologne
BSD 3-Clause "New" or "Revised" License
37 stars 2 forks source link

Error in GateMate p_r tool inferring CC_MULT #23

Open aimamovic6 opened 1 month ago

aimamovic6 commented 1 month ago

A FATAL ERROR occured while running the GateMate p_r tools in the SD-DAC project. This test was performed using the latest yosys and p_r tools.

FATAL ERROR: (105602): Component CC_MULT has no port name: A

105610 Lines read
FATAL ERROR: (105602): Problem Parsing Input Line
program finished with exit code: 10

The line 105610 from sd_dac_top_synth.v reads:

  CC_MULT #(
    .A_WIDTH(32'd16),
    .B_WIDTH(32'd6),
    .P_WIDTH(32'd22)
  ) _10539_ (
    .A(\u_interpolatingFilter.filter1.delay_pipeline[65] ),
    .B({ 1'h0, \u_interpolatingFilter.filter1.product_mux [4], 1'h0, \u_interpolatingFilter.filter1.product_mux [4], \u_interpolatingFilter.filter1.product_mux [4], 1'h0 }),
    .P({ \u_interpolatingFilter.filter1.add_signext [37], \u_interpolatingFilter.filter1.add_signext [20:0] })
  );

It is the first occurance of CC_MULT primitive. The resource usage given in synth.log is as follows:

=== sd_dac_top ===

   Number of wires:               1117
   Number of wire bits:          34954
   Number of public wires:         817
   Number of public wire bits:   26725
   Number of ports:                  4
   Number of port bits:             34
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:              10375
     $scopeinfo                      8
     CC_ADDF                      3770
     CC_BUFG                         1
     CC_DFF                       5336
     CC_IBUF                        18
     CC_LUT1                        14
     CC_LUT2                        68
     CC_LUT3                       906
     CC_LUT4                       144
     CC_MULT                        94
     CC_OBUF                        16

Resource analysis

After excluding the biggest filter from the design (130 taps), the resource usage decreases, but the error still remains (even after omitting several more filters):

=== sd_dac_top ===
   Number of wires:                425
   Number of wire bits:           9927
   Number of public wires:         274
   Number of public wire bits:    7220
   Number of ports:                  4
   Number of port bits:             34
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:               2895
     $scopeinfo                      7
     CC_ADDF                      1061
     CC_BUFG                         1
     CC_DFF                       1395
     CC_IBUF                        18
     CC_LUT1                        12
     CC_LUT2                        31
     CC_LUT3                       196
     CC_LUT4                       130
     CC_MULT                        28
     CC_OBUF                        16

To verify the RTL design synthesis was done using other tools also.

In Latice Diamond for target LFE5U-85F of family ECP5 the summary is as follows:

Device utilization summary:

   PIO (prelim)      34/365           9% used
                     34/205          16% bonded

   SLICE          15659/41820        37% used

   GSR                1/1           100% used
   MULT18           102/156          65% used
   ALU54             51/78           65% used

    Number of register bits => 6330 of 84255 (7 % )
    ALU54B => 51
    AND2 => 434
    CCU2C => 12615
    FD1P3AX => 6302
    FD1P3IX => 8
    FD1S3AX => 17
    FD1S3IX => 3
    GSR => 1
    IB => 18
    LUT4 => 870
    MULT18X18D => 102
    ND2 => 1899
    OB => 16

In Quartus II for target EP2C35F672C6 of the Cyclone II family the summary states:

Total logic elements    8,264
Total combinational functions   6,477
Dedicated logic registers   4,567
Total registers 4567
Total pins  34
Total virtual pins  0
Total memory bits   69
Embedded Multiplier 9-bit elements  160

In VIVADO for target xc7a35tcsg324-3 the summary states:

Resource  Utilization  Available  Utilization %
LUT          3636         20800      17.48
FF            3547         41600      8.53
DSP          90           90         100.00
IO            34           210        16.19

How to replicate this error

Clone the SD-DAC project and adjust the path to the toolchain in the Makefile. Then, simply run:

cd 3.build
make synth
make impl
pu-cc commented 1 month ago

This vector was interpreted incorrectly. We have already fixed this for the next release.

However, it looks like this design with 94 CC_MULT does not fit into the FPGA in terms of resources anyway.

I'll let you know here as soon as the update is online. Thank you.

chili-chips-ba commented 1 week ago

@pu-cc when do you expect to publish tool update? This issue is blocking us, and forcing to target Gowin and LatticeSemi devices instead of GateMate in order to test our designs.

pu-cc commented 1 week ago

The update has been online for quite a while now (Aug 1st). I neglected to report on it here.

aimamovic6 commented 1 week ago

On the download page it says (18.07.2024), my bad for not checking out the release notes. However the implementation still doesn't work on the optimized design (in terms of #CC_MULT and others). The error is now different though:

make: *** [Makefile:85: impl] Error 112

From the impl.log:

MAPPER started

Map Adder
100 Adder chains mapped

During map of 136 CPEs 136 gates deleted!

1453 CPEs replaced by double bit CPEs

Number of Combined CPEs:  2589
#### Error: Exception Handler called. ExitCode: 112
Exception Class: EAccessViolation
pu-cc commented 1 week ago

However the implementation still doesn't work

As I wrote previously https://github.com/chili-chips-ba/openCologne/issues/23#issuecomment-2246080391 it seems that the design with 94 CC_MULT does not fit. We already implemented a proper error handling, and I have accelerated the new release for today.

You find it here. Sorry for the short notice. https://colognechip.com/downloads/cc-toolchain-linux.tar.gz https://colognechip.com/downloads/cc-toolchain-win.zip

aimamovic6 commented 1 week ago

Thank you for providing us with the updated version.

The first thing I noticed is that the design now takes a bit longer to complete synthesis, in comparison to time taken with the toolchain from August release.

The second thing is that the optimized version of the RTL :

found components:
        CC_LUT3       138
        CC_LUT4        70
        CC_LUT2        44
        CC_LUT1        20
        CC_ADDF       629
        CC_BUFG         2
         CC_DFF       781
         CC_MX4        25
        CC_MULT        16
        CC_IBUF         2
        CC_OBUF         1
         CC_PLL         1 

still runs into implementation issues mentioned in the comment above:

make: *** [Makefile:85: impl] Error 112

From the impl.log:

Map Adder
22 Adder chains mapped

During map of 112 CPEs 108 gates deleted!

309 CPEs replaced by double bit CPEs

Number of Combined CPEs:   264
#### Error: Exception Handler called. ExitCode: 112
Exception Class: EAccessViolation
chili-chips-ba commented 1 week ago

@pu-cc any further insights on this bug that's been blocking us for quite some time, and it's still unwieldy?!

pu-cc commented 1 week ago

The update has no effect on the synthesis. Only the P&R binary has been updated.

I'm also getting different results:

found components:
        CC_LUT3       966
        CC_LUT4       153
        CC_LUT2      1230
        CC_LUT1        22
        CC_ADDF      2967
        CC_BUFG         2
        CC_DFF       5061
        CC_MX4         25
        CC_MULT        55
        CC_IBUF         2
        CC_OBUF         1
        CC_PLL          1

Howver, the inputs to the multiplies are not yet comprehensible to us. We still currently check that. Has your netlist been verified by simulation after synthesis?

aimamovic6 commented 1 week ago

I apologize, the code is now updated on the repository to match this log:

        CC_LUT3       117
        CC_LUT2        64
        CC_LUT4        58
        CC_LUT1        13
        CC_ADDF       633
        CC_BUFG         2
         CC_DFF       766
        CC_MULT        16
        CC_IBUF         2
        CC_OBUF         1
         CC_PLL         1
    CC_BRAM_20K         1

Error 112 still persists on make impl. The results of pre- and post simulation do match.

image

pu-cc commented 6 days ago

Thank you, now I can reproduce and understand the issue. We are working on a solution.

aimamovic6 commented 6 days ago

You are the man Patrick

JulianKemmerer commented 4 days ago

For reference I've seen something similar but not inferring multipliers?

MAPPER started

Map Adder
1 Adder chains mapped

During map of 0 CPEs 0 gates deleted!

#### Error: Exception Handler called. ExitCode: 112
Exception Class: EAccessViolation
=== int26_abs_0CLK_6f2c5aad_top ===

   Number of wires:                 35
   Number of wire bits:            999
   Number of public wires:          29
   Number of public wire bits:     918
   Number of ports:                  3
   Number of port bits:             53
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                162
     $scopeinfo                      4
     CC_ADDF                        26
     CC_BUFG                         1
     CC_DFF                         52
     CC_IBUF                        27
     CC_LUT2                         1
     CC_LUT3                        25
     CC_OBUF                        26

found components:
        CC_LUT3        25
        CC_LUT2         1
        CC_ADDF        26
        CC_BUFG         1
         CC_DFF        52
        CC_IBUF        27
        CC_OBUF        26

I think this .v file is enough to recreate?

int26_abs_0CLK_6f2c5aad_top_synth.v.txt

pu-cc commented 3 days ago

@aimamovic6 We have now solved the issue, and I am running all the multiplier tests again, which may take a while. I would like to make an official release as soon as #25 has been resolved. If you would like to test it - which I would appreciate - I have made an experimental version available here: https://colognechip.com/downloads/testing/cc-toolchain-win.zip https://colognechip.com/downloads/testing/cc-toolchain-linux.tar.gz

@JulianKemmerer Thanks a lot! In fact, this really has nothing to do with the multipliers (the actual issue, however, does). Would you like to open a separate issue for this? I already have the netlist in progress anyway.

pu-cc commented 3 days ago

@JulianKemmerer The experimental version now also includes a fix for your issue, which was a bit exotic, but we have now extended our test cases. Thanks a lot! https://colognechip.com/downloads/testing/cc-toolchain-win.zip https://colognechip.com/downloads/testing/cc-toolchain-linux.tar.gz

aimamovic6 commented 3 days ago

@pu-cc Thank you, the implementation now works. I tested it with additional filter stage and it also worked. When I increased the design even more, it failed (as it should, it's too big) and I got this error:

FATAL ERROR: CP lines not routable between Components 11146 and 658
program finished with exit code: 4

so I was just wondering if this error should generally tell us that we are out of resources?

Also, new error appeared when running make impl_sim :

sd_dac_top_00.v:36157: error: Unknown module type: FPGA_RAM
2 error(s) during elaboration.
*** These modules were missing:
        FPGA_RAM referenced 1 times.
***
make: *** [Makefile:108: impl_sim.vvp] Error 2

Is this worthy of a separate issue, because I think we are all done here. Thank you once again.

pu-cc commented 3 days ago

so I was just wondering if this error should generally tell us that we are out of resources?

Before the design no longer fits, other errors can also occur, such as yours where the CP lines can no longer be routed. Could you just check it in so that I can take a look at it?

Also, new error appeared when running make impl_sim :

Did you enable the USE_RAM define in cpelib.v? It should be disabled and read:

[...]
`define xUSE_RAM
[...]
aimamovic6 commented 3 days ago

After enabling the USE_RAM, one of the modules within FPGA_RAM is still not recognised:

...../cc-toolchain-linux//bin/p_r/cpelib.v:2341: error: Unknown module type: dpsram_block_4x512x20
2 error(s) during elaboration.
*** These modules were missing:
        dpsram_block_4x512x20 referenced 1 times.
***
make: *** [Makefile:108: impl_sim.vvp] Error 2

As for the design that causes failure, here are the .v files: sd_dac.zip Gowin IDE also failed implementation for Tang Nano 20k.

JulianKemmerer commented 3 days ago

Thanks @pu-cc for the fast turn around :muscle: I look forward to trying it out!