RobLoach / libretro-dats

Build some of the libretro-database DATs
http://github.com/libretro/libretro-database
25 stars 18 forks source link

Dreamcast GDI files #26

Open RobLoach opened 5 years ago

RobLoach commented 5 years ago

@bkoropoff I did a test of scanning Redump files. Here's the output from the console....

[INFO] [PulseAudio]: Pausing.
[INFO] Pruning file referenced by gdi: /home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 1).bin
[INFO] Pruning file referenced by gdi: /home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 2).bin
[INFO] Pruning file referenced by gdi: /home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 3).bin
[INFO] Parsing GDI file '/home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA).gdi'...
[INFO] GDI '/home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA).gdi' primary track: /home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 3).bin
[INFO] Reading first data track...
[INFO] GDI '/home/rob/Games/Emulation/Sega - Dreamcast/Marvel vs. Capcom - Clash of Super Heroes (USA)/Marvel vs. Capcom - Clash of Super Heroes (USA).gdi' crc: 767da1a0
[INFO] [PulseAudio]: Unpausing.

Unsure why it referenced 767da1a0, as that's the CRC for Track 3 in Redump...

    <game name="Marvel vs. Capcom - Clash of Super Heroes (USA)">
        <category>Games</category>
        <description>Marvel vs. Capcom - Clash of Super Heroes (USA)</description>
        <rom name="Marvel vs. Capcom - Clash of Super Heroes (USA).gdi" size="249" crc="1a9adf80" md5="b1f20d5d8b09c5c1e809223e1e741413" sha1="475fc0e0a662628f14ee9e6767ea7b0bc1527aa9"/>
        <rom name="Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 1).bin" size="1058400" crc="ebd2e8fb" md5="b21479c1f887b159acc3f65d47e275d4" sha1="f9354b0aab7fbdeda6cc716b3a962ee7d4c23b95"/>
        <rom name="Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 2).bin" size="1589952" crc="8557fcaa" md5="824ff123fbce45b5883962eb3fa43cab" sha1="a3705cb0ce53c7adf906c5af76887406984de6a9"/>
        <rom name="Marvel vs. Capcom - Clash of Super Heroes (USA) (Track 3).bin" size="1185760800" crc="767da1a0" md5="cf7c04e062a8d958c71ba02e049c06bd" sha1="339c53a856ea2f7f7c0d293149b3f9646773aa07"/>
    </game>

Do you know what logic it uses to select the "first data track"?

i30817 commented 5 years ago

https://en.wikipedia.org/wiki/GD-ROM#Regions

Basically, second data area (not tracks, gd-rom has '3' data areas with different 'tracks' inside it) is 'copy protection' with some copyright text. The hashes of that are probably not unique.

First data area is a mixed 'std cd-rom' area and the first 'actual track' inside that 'redump/TOSEC track' is the famous warning not to play a gd-rom in cd players, and the rest is binary data, or any other crap the devs wanted to make visible on a windows filesystem. It's quite possible the set is full of duplicates for this 'redump track' also.

The code seems to think that the 'thing to hash' is the third. Rightfully in my opinion since doing otherwise would make the 'hash' useless to identify romhacks, like the shemue retranslation for example.

I suspect that the sega header is right at the start of 'track 1' (if it doesn't interfere with the audio warning) or 'track 3' from redump but i didn't actually check it). If this is true, there is good reason to suppose that hashing the first 256 bytes of track3 would serve as a 'serial' but i didn't check where the SEGA header actually is. Extracting the 'actual serial' from the SEGA header seems to be kind of awkward especially on earlier consoles since fixed offsets seem not to be enough, but maybe i just don't understand the format and it has index pointers and base64 encoding or something. Or maybe it's another case of 'what counts is what's painted on the disc&manual, not what's in the data'.

Retroarch already has too many cases of false positives due to using 'serials' instead of allowing me to force hashes checks. The situation gets ridiculous if all platforms with a serial identify games that were hacked as the original game. Why am i even contributing cd hacks hashes if they won't display the right name and info? That's a different problem though, fast serial checks also have their uses.

I just wish the scanner was much much smarter. Some setting like:

Preferred Scanner mode if possible = |hash|serial|mixed¹|

would be very useful imo.

¹ first check if serial exists on hack database only hash to check if so. This increases the 'hash costs' for games with hacks, but prevents games without hacks from waiting to hash gigabytes at the minor minor cost of a extra serial check for all games into a separate 'hack database'.

If calculated hashes and serials were what is saved on playlists (instead of database hashes and serials), it would also allow the creation of two levels of 'configurations' for games, one that applies to all with the same serial (hacks included) one that applies to exact hash checks only after a scan. Dolphin goes even further and allows 'region duplicates' for configurations (ie: even looser than serials), but i think that depends on a predictable serial format.

There also needs to be a system to override factory print errors with a hacks that force a different check for a 'serial' (ie: urban chaos and threads of fate have the same serial in disc even if they have 'different' serials on the - corrected - database, so the right match may not succeed)