andkorzh / RP2C02-7-

A clock-accurate FPGA clone of the NES 2C02(7) PPU, created on the basis of reverse engineering.
GNU General Public License v3.0
10 stars 1 forks source link

Synthesis attempts #1

Open loglow opened 3 months ago

loglow commented 3 months ago

Hi, I'm trying to synthesize this project and running into some errors.

I have very little experience with FPGAs so I'm sure I'm doing something wrong. I'm willing to learn though. My goal is to create a small board that could work as an RGB PPU drop-in replacement. I've designed and assembled many PCBs from scratch, so I know how to handle all the aspects on that side of things. The board design would be licensed CC BY-SA, by the way.

So...

Lattice Diamond 3.13.0.56.2 with LSE

First synthesis error is below. I've omitted paths from all the messages here for brevity.

ERROR - p2c02.v(1310): net OAM1ADR[7] is constantly driven from multiple places at instance MOD_OAM_RAM, on port p_0[7]. VDB-1000

So I changed the Resolve Mixed Drivers option to True.

Now I get all these warnings, but they don't cause synthesis to fail:

WARNING - rp2c02.v(694): identifier HIN5 is used before its declaration. VERI-1875
WARNING - rp2c02.v(696): identifier H_LINE23 is used before its declaration. VERI-1875
WARNING - rp2c02.v(1091): identifier THZ is used before its declaration. VERI-1875
WARNING - rp2c02.v(1091): identifier TVZB is used before its declaration. VERI-1875
WARNING - rp2c02.v(1092): identifier TVZ is used before its declaration. VERI-1875
WARNING - rp2c02.v(1092): identifier NTHCout is used before its declaration. VERI-1875
WARNING - rp2c02.v(1093): identifier NTVCout is used before its declaration. VERI-1875
WARNING - rp2c02.v(1282): identifier OFETCH is used before its declaration. VERI-1875
WARNING - rp2c02.v(1620): identifier C is used before its declaration. VERI-1875
WARNING - rp2c02.v(1620): identifier nB_W is used before its declaration. VERI-1875

WARNING - rp2c02.v(1310): instantiating unknown module OAM_RAM. VERI-1063
WARNING - rp2c02.v(1311): instantiating unknown module OAM2_RAM. VERI-1063
WARNING - rp2c02.v(1627): instantiating unknown module PALETTE_RAM. VERI-1063
WARNING - rp2c02.v(1628): instantiating unknown module PALETTE_RGB_TABLE. VERI-1063

But there is a new error:

ERROR - logical block 'MOD_PALETTE/MOD_RGB_TABLE' with type 'PALETTE_RGB_TABLE' is unexpanded.

And now I'm stuck about how to proceed.

Lattice Diamond 3.13.0.56.2 with Synplify Pro U-2023.03L-SP1

The following errors occur:

@E:CG389 : RP2C02.v(1310) | Reference to undefined module OAM_RAM
@E:CG389 : RP2C02.v(1311) | Reference to undefined module OAM2_RAM
@E:CG389 : RP2C02.v(1627) | Reference to undefined module PALETTE_RAM
@E:CG389 : RP2C02.v(1628) | Reference to undefined module PALETTE_RGB_TABLE

Oddly, it looks like synthesis does keep going, with these warnings following each error:

@W: CG141 : RP2C02.v (1310) | Creating black box for OAM_RAM
@W: CG141 : RP2C02.v (1311) | Creating black box for OAM2_RAM
@W: CG141 : RP2C02.v (1627) | Creating black box for PALETTE_RAM
@W: CG141 : RP2C02.v (1628) | Creating black box for PALETTE_RGB_TABLE

However, I can't figure out if there's a way to treat the above as warnings instead of errors, or if that would even be helpful. All subsequent steps are cancelled because it looks like an "SRS file" is either not found or generated for the next step.

I'm stuck here.

Efinix Efinity 2024.1.163

In this package, I get the following error:

ERROR    : Bi-directional port 'DB[7]' is not supported in this device. Check synthesis options. [EFX-0265]

Unfortunately, this error occurs with any device I try. Does Efinity handle inout ports differently somehow?

Of note, I see that Efinity has a --blackbox-error synthesis setting, but that's moot because it's not even getting past "pre-synthesis checks" yet.

And this is where I'm stuck.


Any help you can provide would be awesome.

Like I said, I know very little about FPGA software or toolchains, but I'm pretty good at following directions and troubleshooting. My first goal was to synthesize something in order to get a ballpark sense of the number of logic cells needed, so I could begin selecting which FPGA chip to use. Lattice (MachXO_) and Efinix (T4, T8, T20) stood out because their chips are plentiful at my preferred parts distributor, but I'm certainly open to using something else.

Anyway, if you'd rather not bother dealing with someone who has so little FPGA/Verilog experience (like me), I certainly understand.

In any case, this is a cool project and thank you for making it, sharing it, and releasing it under the GPL.

andkorzh commented 3 months ago

Unfortunately I have never dealt with LAtisse Diamond. I can only drop a finished project in Quartus 13 into the repository.

andkorzh commented 3 months ago

Please note that this design uses internal fpga RAM resources for the spritesheet and for the palettes, as well as for the VRAM, so you must initialize them first.

andkorzh commented 3 months ago

I have already loaded the finished project into Quartus 13. I only need to assign the outputs. I work exclusively with ALTERA.

andkorzh commented 3 months ago

Let me know about your success, I am very interested in your PPU project. :)

andkorzh commented 3 months ago

@W: CG141 : RP2C02.v (1310) | Creating black box for OAM_RAM @W: CG141 : RP2C02.v (1311) | Creating black box for OAM2_RAM @W: CG141 : RP2C02.v (1627) | Creating black box for PALETTE_RAM @W: CG141 : RP2C02.v (1628) | Creating black box for PALETTE_RGB_TABLE

It is necessary to create memory modules for different blocks, each module will have its own data bus width and different size. Also for ROM modules it is also required a file of initialization of contents, in Quartus it is a file mif or hex.

loglow commented 3 months ago

Thanks so much @andkorzh!

I'm downloading Quartus now and will give it a shot.

loglow commented 3 months ago

@andkorzh, I've been able to get Quartus up and running. I currently have installed:

13.0 appears to be the last version to support the Cyclone II family. It looks like even 13.1 drops support for it.

I was hoping to get through the full compilation process on my end without modifying anything in your repo, more as a sanity check than anything else, but that may not actually be possible, because when I attempt to compile/synthesize I get the following error:

Internal Error: Sub-system: DYGR, File: /quartus/ddb/dygr/dygr_place_info.cpp, Line: 3776

Followed by a stack trace. Unfortunately, after some research, this appears to be related to running older versions of Quartus using a VM on a host machine with an M1/M2 (Apple Silicon) processor, which is exactly what I'm doing. I posted something on the Intel forums, but I'm not expecting a fix. Many other people have had the same problem. Bummer.

In any case, a Cyclone II wouldn't be my ideal choice for a target anyway. I'm wondering if a MAX 10 chip could work for this design? They're inexpensive, plentiful, and small—and they're supported by the latest version of Quartus (23) which apparently has the above error fixed. I'm willing to do some work to port the project to a new FPGA family if it seems like a sane idea, although I'm sure I'd have questions during the process. What do you think?


On another note, I'm suspecting that the RGB output of the FPGA is in the form of digital parallel signals, am I correct? I'm seeing 24 RGB output pins (output [23:0]RGB) which would make sense as 8-bits for R, G, and B for each pixel. I'm imagining that I'd feed this into something like a BH7240 (which I already have some familiarity with) to produce the video signals, which I'd then route to (DIP40) pins 14, 15, and 16 like a real 2C03. Let me know if all this tracks for you.

loglow commented 3 months ago

Oh, also... I was looking at V1, V2, V3 and trying to begin making some sense of them.

V2 looks like the most straightforward, with direct bus inputs to the 2C02's PD[7..0] pins.

V1 and V3 look similar to each other. Are these buffering the contents of VRAM inside the FPGA? And if so, is that just to eliminate the need for a discrete external VRAM chip entirely in a system?

andkorzh commented 3 months ago

I used AD7125 as a DAC, but you can also use other triple video DACs. Some already have an encoder in s-video and a composite output inside. Alternatively, you can make your own composite encoder using the FPGA, and it will be quite high in quality. You will only need to add a 7-bit r-2-r DAC. This composite encoder module was developed by HardwareMan. I tried it on this PPU and am very pleased with the result. But RGB is still preferable. The only thing I have not implemented yet is emphasis. I need to redesign the RGB palette ROM. I should also warn you that I use an inverted interrupt signal at the FPG output, this is to solve problems with compatibility in voltage levels, because the NMI input of the processor is pulled up to 5 volts, so I use an open collector transistor on the FPGA PPU board. As you understand, it additionally inverts the interrupt signal. Therefore, it is necessary to additionally apply this inversion in the PPU Verilog code.

andkorzh commented 3 months ago

I can recompile for any available Сyclone. As an option for a cheap EP4CE6E22C6N. No problems. And you will be able to use the latest versions of Quartus.

andkorzh commented 3 months ago

As for the VRAM memory, I used three different options for versatility, FPGA resources allow using their RAM resources for video memory. I encountered the Everdrive problem if using external memory and the 373 latch. Sometimes yellow stripes appeared on the image. But if using the internal memory and the FPGA latch, there were no problems, I ask you to note that this problem did not concern other original cartridges, everything worked fine with them with any of the options. Most likely, the problem is with timings and signal delays on the Everdrive voltage shifters. I had a similar problem with the HardwareMan PPU design. But it uses one phase of the pixel clock inside and is more optimized in terms of resources inside. I can't show you its design, since I am not authorized to distribute it. But the similarity of the problems says a lot, apparently something needs to be improved for good work with the Everdrive. Therefore, to minimize problems, use the option with internal VRAM memory and you will not need a latch as such for it. However, you need more conductors to the console board than 40 :). But these are small details.

loglow commented 3 months ago

Great, thanks for your replies @andkorzh!

I'm currently looking at the 10CL010YM164I7G which is a Cyclone 10 LP chip in a small 8x8 mm 164-BGA package. It's cheap (relative to other Cyclone chips), available, and small, and I believe it has the resources needed to run this project. I don't mind routing BGA pads, although I'm not setup to solder BGA chips myself, but that's alright. Do you think this is a reasonable chip to use?

I've been able to successfully import your shared project files into Quartus Prime and migrate them to the Cyclone 10 LP family as a target, with the above chip selected too. I'm also able to fully compile the project without any fatal errors! Awesome. I see SOF and POF files in the output_files folder, which I suspect is a good sign.

There are some warnings though, which I wanted to check with you about.

Warning (287001): Assertion warning: Device family Cyclone 10 LP does not have M4K blocks -- using available memory blocks

Does the kind of memory block matter here?

Also, for each of the three default timing analyses it runs at the end, I get this warning:

Critical Warning (332148): Timing requirements not met

Each one is followed by worst-case "setup slack" and "hold slack" values ranging from -2 to -4 (for setup slack) and 0.1 to 0.3 (for hold slack). I don't know if these warnings or values are relevant. The slack numbers don't have units. Most of the warnings seem to be about the four inst2|altpll_component|pll|clk[_](~1) clocks, not MCLK or MCLKPAL.

Finally, these two (possibly related) warnings:

Warning (15714): Some pins have incomplete I/O assignments. Refer to the I/O Assignment Warnings report for details
Critical Warning (169085): No exact pin location assignment(s) for 71 pins of 71 total pins. For the list of pins please refer to the I/O Assignment Warnings table in the fitter report.

Does this have to do with assigning logical pins to physical device pins? They appear to be assigned automatically; I'm looking at the Pin-Out file under the Fitter heading in the Table of Contents which appears to be the map of signals to physical pins. One note: When migrating to the new target family, I had the "remove all locations" (or something like that) checkbox checked... it said doing this was "recommended by Intel." I'm not sure if that's pertinent.

Lastly, the interface is encouraging me (with bright yellow) to do a "recommended" IP upgrade. I experimented some with this, but the process seemed error-prone... at one point, several of the IP units disappeared entirely from the IP Components list after being upgraded. Also, the ALTPLL unit was unable to upgrade itself for some reason (the others seemed successful). Anyway, I gave up and reverted back to the original (unmodified, but still migrated) project. Since these IP upgrades aren't required, are they worth messing with?

loglow commented 3 months ago

I'm also thinking of picking up a Terasic USB Blaster to program the FPGA chip in circuit. It's quite a bit less expensive than Altera/Intel programming devices. Any thoughts about this before I order one?

andkorzh commented 3 months ago

Different Cyclone families have different internal memory resources, for example: m4k Cyclone 2, m9k Cyclone 4, m20k Cyclone 10. Create a simple project and add a memory module to it, which can be generated using the MegaWizard Plug-in Manager. Practice, it will not be superfluous to consolidate knowledge about FPGA.

andkorzh commented 3 months ago

Regarding the timing issue, this most often occurs due to the use of two clocks for PLL. With one clock, for example, for NTSC or PAL, this problem does not arise. It is probably more reasonable for you to use a single clock for your NES region.

andkorzh commented 3 months ago

I use cheap Chinese blasters. Also blasters and demo boards from the company WaveSchare have proven themselves very well. https://www.waveshare.com/product/fpga-tools/programmer/altera-core/usb-blaster-v2.htm

andkorzh commented 3 months ago

Regarding the use of cyclone 10, this is primarily soldering of BGA. Which in a home laboratory for beginners causes a problem. Therefore, I prefer lighter and more easily mounted FPGA cases.

loglow commented 2 months ago

Different Cyclone families have different internal memory resources, for example: m4k Cyclone 2, m9k Cyclone 4, m20k Cyclone 10. Create a simple project and add a memory module to it, which can be generated using the MegaWizard Plug-in Manager. Practice, it will not be superfluous to consolidate knowledge about FPGA.

Oh I see now, they're just different numbers of bits per block. M9K = 9Kb/block, M20K = 20Kb/block, etc.

loglow commented 2 months ago

I use cheap Chinese blasters. Also blasters and demo boards from the company WaveSchare have proven themselves very well.

Thanks! I picked up some cheap FPGA programmers to use.

loglow commented 2 months ago

Regarding the use of cyclone 10, this is primarily soldering of BGA. Which in a home laboratory for beginners causes a problem. Therefore, I prefer lighter and more easily mounted FPGA cases.

Yeah, I don't usually try to solder BGA packages myself either. But I can have have my PCB assembly supplier do it with their professional equipment :)

loglow commented 2 months ago

Regarding the timing issue, this most often occurs due to the use of two clocks for PLL. With one clock, for example, for NTSC or PAL, this problem does not arise. It is probably more reasonable for you to use a single clock for your NES region.

I just made an attempt to limit the existing design to one (NTSC) clock:

The relevant part of V4.bdf looks like this: Screenshot 2024-08-07 at 2 12 38 AM

My expectation is that the MODE pin would still need to be low in order to generate proper NTSC output (if no changes are made to RP2C02.v).

This modified project compiles successfully without any timing warnings.

Do all these changes seem correct to you?

If so, my next steps would be assigning and locking the pin locations and then making an actual board to test.

(With the MODE pin no longer needed to set the region, I think it would be interesting to repurpose it to allow selection between two palettes: one with 2C02-like colors and the other with 2C03-like colors.)

loglow commented 2 months ago

One small observation about the clock... I think 69.841 ns might be slightly preferable to 69.842 ns:

                    NTSC ideal = ~21.477272 MHz (± 40 Hz)
69.841 ns --> 14.3182371 * 3/2 = ~21.477355 MHz (+ 83 Hz)
69.842 ns --> 14.3180321 * 3/2 = ~21.477048 MHz (- 224 Hz)

So neither option is strictly within the NTSC spec, but 69.841 ns gets us roughly 141 Hz closer than 69.842 ns does.

But there's an external master clock signal anyway... so I'm still not entirely sure how this even relates to that? Unless... is the sdc file only used for the purposes of analysis/simulation?

andkorzh commented 2 months ago

You are making progress. You can ground the Mode input. And then the synthesizer will optimize the excess during compilation. And about the clock accuracy, so when using RGB, and not a composite NTSC encoder, you do not need high accuracy, and TVs should have a color tone adjustment in the menu for NTSC. Do not worry about accuracy, it is unnecessary. Use any cheap and available oscillator and everything will work. I ordered ready-made oscillators for 21.477 MHz on Aliexpress, everything works fine. But initially I only had 14.318 and they worked just as well with PLL.

andkorzh commented 2 months ago

You can use the base clock for NES NTSC instead of the doubled 42.954 MHz, 21.477 MHz as the main master clock should work without any problems.

loglow commented 2 months ago

You can use the base clock for NES NTSC instead of the doubled 42.954 MHz, 21.477 MHz as the main master clock should work without any problems.

Are you saying that I can use an existing 21.47727 MHz external clock signal as an input to BOTH the Clk and Clk2 inputs of the RP2C02 module (and not use a PLL at all)?

If this is the case, then if the external clock was 26.601712 MHz and the MODE pin was pulled high, the output would also be PAL compliant, am I correct?

andkorzh commented 2 months ago

You can use native clocks for regions. In any case, you can always add PLL and multiply the master clock by 2. You can test with a doubled clock and with the original, I did not find any differences in behavior. It is also advisable to pull the Mode pin to the power supply with a weak pullup resistor in the pin settings in the pin planner.