RobLoach / libretro-dats

Build some of the libretro-database DATs
http://github.com/libretro/libretro-database
25 stars 18 forks source link

Redump Serial vs Internal Serial #59

Open kikmon opened 1 year ago

kikmon commented 1 year ago

Hi, I see this topic has been discussed some time ago, but I had a few questions. (Trying to fix some PSX games not scanning properly :) ) If I understand properly, Redump has 2 kind of serials. The Internal, that is really in the disk data (and I think this is the one Retroach will find), and the Serial that appears on the disk labels / box. If the internal has the same value as the regular one, internal is just omitted by Redump. So to fix the serial scanning of these games that have an internal serial, we need the information in the RDB, and for, that we need to get from Redump's dat. Is there a way to have the internal serial in the Redump dat ? I see the tool calls http://redump.org/datfile/psx/serial,version to get the serial. Is there a URL to also get the internal one ? If we can get this info in the rdb, then the scanning logic of RA could be altered to take the 'internal serial' into account if present.

RobLoach commented 1 year ago

All data that's grabbed from redump is build into .dat files over at: https://github.com/libretro/libretro-database/tree/master/metadat/redump

This includes the serial number. For example...


game (
    name "Crazy Taxi (USA)"
    description "Crazy Taxi (USA)"
    region "USA"
    serial "51035"
    rom ( name "Crazy Taxi (USA) (Track 1).bin" size 1058400 crc F8BB5B3C md5 BE87B59616171C57D9FD252ACDDA806B sha1 CF38F6C262F304729870F443FCE69FFD182214E1 serial "51035" )
)
kikmon commented 1 year ago

Thanks for the followup. Yes, but there's only one serial in this example, and the 'serial vs internal serial' plagues Multi Disc games mostly.

Let's take an example that RA is unable to scan: PS1 game Alive (Japan) (Disk 2) On Redump, (http://redump.org/disc/22867/) the serial is 'SLPS 01528' (printed on Disk Label) and Internal Serial is 'SLPS-01527' (the one found in the track data) The dat file you linked shows 2 entries for Alive (Japan) (Disk 2), featuring Serial SLPS 01528 and SLPS 01528-1, both derived from the regular Serial, but there is no trace of the internal serial (SLPS-01527) When RA extracts the serial from the disk, it finds ... SLPS-01527, and since it's a multidisc game, as a good soldier, it adds -1, and then performs a lookup in the database for SLPS-01527-1, which is unknown in the DB

For the scanning to work, we'd need some knowledge of this 'Internal Serial'. It's not present in the Redump dat we're downloading, hence my question about the existence of a Redump Download URL that can return both serials. :)

RobLoach commented 1 year ago

Ah, I understand. That makes sense, thanks a lot. Curious about this as well. RetroArch requires one-rom to one-entry, so we would somehow need to split the entries.

kikmon commented 1 year ago

Yep, Once we know SLPS-01527 is the internal one, we can maybe build an entry for SLPS-01527-1 But in order to have the right data, with need Redump to cooperate. I've looked at my other scan failures, and here's another funny one http://redump.org/disc/32630/ (SLPM 86306 vs SLPS-01487) It's a single disc game, but the serial on the label doesn't match the internal serial :)

RobLoach commented 1 year ago

@pkos may be interested in some of this too. He had been working on this repo: https://github.com/pkos/CRoSG which brings in a few additional serials.

Tested out building it into the .dat files over at https://github.com/RobLoach/libretro-dats/pull/53 . Could use some thoughts and testing.

kikmon commented 1 year ago

Interesting. If I read this well, it's mainly to add chd crc scanning ? (I'm not seeing the serials in the dat) Do we have a privileged communication channel with redump to get the missing info ?

kikmon commented 1 year ago

Ok, got in touch with the kind people at Redump. Internal Serials are not part of the database, only comments, so we can't get them in the dat. There's a chance to have them in the future, but not anytime soon, so we must find another way to recognize these games.

pkos commented 1 year ago

In most system disc media there are enough clues to find the serials, however others system media simply never recorded a serial in the data, so the hook is missing.  CRoSG is an alternative for such media, in fact we've already implemented internal CHD SHA in the data files.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:40 AM, @.***> wrote:

Ok, got in touch with the kind people at Redump. Internal Serials are not part of the database, only comments, so we can't get them in the dat. There's a chance to have them in the future, but not anytime soon, so we must find another way to recognize these games.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

pkos commented 1 year ago

Rob can also confirm that when no serial is detected, the next step is CRC, even large ISO and BIN/CUE.  These data files Rob loads cover both serials and CRC, correct Rob?  Next step could be rated internal CHD SHA which is precalculated and stored in CHD creation from such original disc media.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:50 AM, Peter @.***> wrote: In most system disc media there are enough clues to find the serials, however others system media simply never recorded a serial in the data, so the hook is missing.  CRoSG is an alternative for such media, in fact we've already implemented internal CHD SHA in the data files.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:40 AM, @.***> wrote:

Ok, got in touch with the kind people at Redump. Internal Serials are not part of the database, only comments, so we can't get them in the dat. There's a chance to have them in the future, but not anytime soon, so we must find another way to recognize these games.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

pkos commented 1 year ago

Cranking up the speed it goes CRC, Serial, but internal SHA is extremely faster.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 11:00 AM, Peter @.***> wrote: Rob can also confirm that when no serial is detected, the next step is CRC, even large ISO and BIN/CUE.  These data files Rob loads cover both serials and CRC, correct Rob?  Next step could be rated internal CHD SHA which is precalculated and stored in CHD creation from such original disc media.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:50 AM, Peter @.***> wrote: In most system disc media there are enough clues to find the serials, however others system media simply never recorded a serial in the data, so the hook is missing.  CRoSG is an alternative for such media, in fact we've already implemented internal CHD SHA in the data files.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:40 AM, @.***> wrote:

Ok, got in touch with the kind people at Redump. Internal Serials are not part of the database, only comments, so we can't get them in the dat. There's a chance to have them in the future, but not anytime soon, so we must find another way to recognize these games.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

kikmon commented 1 year ago

Thanks for joining :) As far as I see, there are 2 issues mixing up here:

pkos commented 1 year ago

Perhaps let us know how many PSX games are matching to serials and which are going to CRC, that can be retested.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 11:38 PM, @.***> wrote:

Thanks for joining :) As far as I see, there are 2 issues mixing up here:

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

pkos commented 1 year ago

The log files can be enabled for data on your complete scan if you enable logging in RA.

Sent from Yahoo Mail on Android

On Fri, Aug 12, 2022 at 10:33 AM, Peter @.***> wrote: Perhaps let us know how many PSX games are matching to serials and which are going to CRC, that can be retested.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 11:38 PM, @.***> wrote:

Thanks for joining :) As far as I see, there are 2 issues mixing up here:

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

pkos commented 1 year ago

Rob could we get a quick patch to make sure the overall RDBs have a latest update date stamp.  Redump is much more active about dats.

Sent from Yahoo Mail on Android

On Fri, Aug 12, 2022 at 10:42 AM, Peter @.***> wrote: The log files can be enabled for data on your complete scan if you enable logging in RA.

Sent from Yahoo Mail on Android

On Fri, Aug 12, 2022 at 10:33 AM, Peter @.***> wrote: Perhaps let us know how many PSX games are matching to serials and which are going to CRC, that can be retested.

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 11:38 PM, @.***> wrote:

Thanks for joining :) As far as I see, there are 2 issues mixing up here:

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

pkos commented 1 year ago

The people at redump didn't inform you of a url that can be used to get serials in their days my friend, we both have had some good conversations in VGPS.  The URL format is http://redump.org/datfile/ss/serial,version Try http://redump.org/datfile/PSX/CRC,serial,version

Sent from Yahoo Mail on Android

On Thu, Aug 11, 2022 at 10:40 AM, @.***> wrote:

Ok, got in touch with the kind people at Redump. Internal Serials are not part of the database, only comments, so we can't get them in the dat. There's a chance to have them in the future, but not anytime soon, so we must find another way to recognize these games.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

kikmon commented 1 year ago

Thanks for your feedback :)

Rob could we get a quick patch to make sure the overall RDBs have a latest update date stamp. Redump is much more active about dats.

I don't think this would help much since the ps1 rdb is still created with the parameter 'rom.Serial'. When transforming the .dat into .rdb, some information is lost. Maybe going back to rom.crc for PS1 would yield overall better results

The log files can be enabled for data on your complete scan if you enable logging in RA.

Yeah, I've been running RA Scanning in Debug mode and added more traces to ease log analysis. I'm working on a subset of 4000 JP games at the moment. Takes a while to scan :) Right now I'm investigating 11 that fail scanning

The people at redump didn't inform you of a url that can be used to get serials in their days my friend, we both have had some good conversations in VGPS. The URL format is http://redump.org/datfile/ss/serial,version Try http://redump.org/datfile/PSX/CRC,serial,version

The url /serial,version is already used to generate the redump metadat. We have the serials already, that's not the problem. What we're missing are the 'Internal Serials' that are not part of Redump Database. As mentioned earlier here's a good example http://redump.org/disc/32630/ Notice the difference between the 'Serial' and 'Internal Serial'

RobLoach commented 6 months ago

@kikmon What's the difference between Internal Serial and Serial?

kikmon commented 6 months ago

Disclaimer, I'm mostly talking about PS1 games here :) Serial is the Serial number that appears on the game cover, disc label, etc. Internal Serial, is the Serial found in the data of the disc, the one Retroarch will find when scanning. Usually, they are the same, but for some games, they can be different, preventing Retroarch from scanning the game properly. Redump contributors started adding the Internal Serial information, but unfortunately, it is part of the Comments, and not in the database, meaning it's not part of the Dat. Unfortunately, Redump team seemed to be unable to change the database structure for now, so this information, useful for scanning, can't be used as-si

RobLoach commented 6 months ago

Very cool. Seems a lot more useful than the Serial. Do you know the URL code to grab the Internal Serial data in the dat? For example... http://redump.org/datfile/PSX/CRC,serial,version,internal_serial

or something?

kikmon commented 6 months ago

As far as I know, it's not possible right now. There is no database field for the Internal Serial, as they are parts of the Comments, and I don't think we can request the comments to be included in the dat file As mentioned earlier, this entry shows the issue :) http://redump.org/disc/32630/ Internal and 'regular' serial are quite different, and Internal Serial is in the Comment section

RobLoach commented 6 months ago

That's unfortunate. We likely won't be able to include it then.

kikmon commented 6 months ago

yeah, I don't think we can do it the 'clean way' Now, we could scrape Redump website to extract the Internal Serial from the webpages directly, and patch the official dat file by adding a new field :) Unfortunately, because of internal issues, I don't think Redump will be able to update it's DB scheme. Unless some changes occur in Retroarch Scanning logic, these games with different internal serial will never be scanned properly

RobLoach commented 6 months ago

If you're up to manually do it, could add entries manually to the /dat folder https://github.com/libretro/libretro-database/tree/master/dat

kikmon commented 6 months ago

Are you talking about https://github.com/libretro/libretro-database/blob/master/dat/ps1.idlst ? Is this file combined with the official Redump DAT to create the final RDB ? If that's the case, then yeah, I could probably add a few problematic Serials in this file