andrebrait / 1g1r-romset-generator

A small utility that uses No-Intro DATs to generate 1G1R ROM sets
GNU General Public License v3.0
213 stars 20 forks source link

Getting a warning of file not found for a file that shouldn't be considered #15

Closed HVR88 closed 2 years ago

HVR88 commented 4 years ago

This might not be a bug, depending on whether you first output text for files found in the source folder matching the dat before evaluating them

Platform: Sega Genesis DAT: current P/Clone generated today at dat-o-matic

Error message: WARNING: candidate [Puyo Puyo Tsuu (Europe) (Ja) (Rev A) (Virtual Console).zip] not found, trying next one WARNING: no eligible candidates for [Puyo Puyo Tsuu (Japan) (Rev A)] have been found!

Parameters: python3 generate.py -r CAN,USA,EUR -l en -e zip --all-regions-with-lang --prefer-parents --no-bios --no-program --no-enhancement-chip --no-beta --no-demo --no-sample --no-promo -d "DAT/SegaGenesis.dat" -i ../source -o ../destination

So the files warned about should be excluded given the "en" language setting. I just found it curious the errors were output at all given this restriction.

andrebrait commented 4 years ago

Yes, that most likely should not be happening. The candidate selection happens before any file selection, so I'll have to check.

I assume this is a recent dat for the genesis, right?

HVR88 commented 4 years ago

Generated within the past couple of hours, yes.

This Genesis DAT gave me the most issues because it also had a lot of errors/issues versus the current No-Intro file set I have. That made a number of titles not copy, though I can't fault your script for that.

There may be other issues, so if you might want to do a thorough sanity check because I didn't have the time to do a thorough investigation of the text in the dat.

I'm using the sets from the well-known full No-Intro platform collection at archive.org

andrebrait commented 4 years ago

IAmTheDewd's collection? He helped a lot with the first revisions of this and his collection was how I got to know 1G1R sets in the first place :-)

Quick question(s): are you using 1.9.0? If so, do you see a list of threads and a progress bar for the file scan when you start it up? If so, has it been reasonably fast for you?

I'm going to check those specific entries in the dat and report back. There's always the chance they are being quoted as being in English.

andrebrait commented 4 years ago

Ah, I got what happened. Because of the region Europe, my script assumes this game has English, but it should only do that if it didn't find any language specifications. There is one, though: (Ja).

I have to check why it's not detecting Japanese there.

HVR88 commented 4 years ago

Yes, his sets. :)

With regards to the version, I downloaded as zip a minute or two after you followed up to the original bug I posted earlier.

When executing with the arguments I pasted above I get a progress list, one line per copy, to stdout but no progress bar that I've noticed.

andrebrait commented 4 years ago

You should see something like this:

Scanning directory: /home/andre/Downloads
Found: 1000 files
Thread 1: DONE
Thread 2: DONE
Thread 3: DONE
Thread 4: DONE
Calculating hashes [##################################################] 1000/1000

<List of ROMs>

Execution finished. Check the ./generate.log file for logs.

And I see here that this bug with the warning only happens when scanning. I'm going to check it.

andrebrait commented 4 years ago

It turns out, it actually happens on both, and it has to happen :)

So, the criteria I use is Region and then Language. If the region matches, the language doesn't matter, it will just play a role into the order at which the candidates are picked but it won't hard-filter by language.

It dated back from when I did not attempt to guess the language based on the region, etc.

So, it's being included because it is from Europe and you selected the EUR region. I'm going to add an option, I guess, to hard filter on language.

andrebrait commented 4 years ago

I just released version 1.9.1, which should give you the option to filter by language :wink:

HVR88 commented 4 years ago

Sounds great. In testing and reporting these, I actually discovered a similar soft-fail situation in a 1G1R tool for discs-based sets that I recently created, so bonus!

HVR88 commented 4 years ago

One question about the new hard-filtering... Do you still include inferred languages by region?

Example: USA, Canada, Australia, UK = Always have English even if not listed as a Language. France = French, etc...

I do believe that the Europe release with Ja language (other issue report) is an error in the DAT. There's no way a European release would/could ever have Japanese as the only language.

I keep table in my own tool for the default language for every region and of course, same as you mentioned before, Europe with empty language always = English.

andrebrait commented 4 years ago

Yes, I still include the inferred one, but only if explicit information is not available in the ROM.

And that ROM is a Virtual Console one. It might very well be that it's available in Europe but without translation :-/

andrebrait commented 4 years ago

I personally exclude VC ROMs, usually

HVR88 commented 4 years ago

I didn't even know about the VC releases until I started noticing them when testing yesterday. I agree, I'd prefer to leave them out. Can that be specified with this tool? I have to read the help more carefully. :)

HVR88 commented 4 years ago

Just for sanity's sake, this is what I have right now for Redump languages and regions - this is used only when there is no language specified at Redump (not very many, but enough that I needed this). I know it's missing Switzerland, Belgium and some others - I'm adding them where appropriate (Switzerland to German and French, Belgium to French, Austria to German)

            'English'   => array("USA","Canada","Australia","UK","Ireland","Europe"),
            'Japanese'  => array("Japan"),
            'French'    => array("France"),
            'German'    => array("Germany"),
            'Spanish'   => array("Spain","Argentina","Latin America"),
            'Italian'   => array("Italy"),
            'Dutch'     => array("Netherlands"),
            'Portuguese'=> array("Portugal","Brazil"),
            'Swedish'   => array("Sweden"),
            'Norwegian' => array("Norway"),
            'Danish'    => array("Denmark"),
            'Finish'    => array("Finland"),
            'Chinese'   => array("China"),
            'Korean'    => array("Korea"),
            'Polish'    => array("Poland"),
            'Russian'   => array("Russia"),
            'Greek'     => array("Greece"),
            'Czech'     => array("Czech"),
            'Hungarian' => array("Hungary"),
            'Turkish'   => array("Turkey"),
            'Arabic'    => array("Arabic")
HVR88 commented 4 years ago

1.9.1 seems to produce good results and is catching files previously missed, plus selecting some different files for ones that were previously included (need to verify these more closely).

I noted one file missed from NES which I can't explain:

<game name="Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)" cloneof="Akumajou Special - Boku Dracula-kun (Japan)">
    <description>Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)</description>
    <rom name="Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).nes" size="262144" crc="64afd592" md5="5631eae8bc0845bb3b6876ad95832b4f" sha1="618ac6835b96bb5ebfc57dd3b828fafbb0e0fc7d" status="verified"/>
    <rom name="Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).sav" size="8192" crc="aba5001b" md5="dce27edb434d238613f928daa5c50632" sha1="1004aec3f6ff9d2135c0a5150e37aa8250f4f22b" status="verified"/>
</game>

Parameters: python3 generate.py -r CAN,USA,EUR -l en --all-regions-with-lang --only-selected-lang --prefer-parents --no-bios --no-program --no-enhancement-chip --no-beta --no-demo --no-sample --no-promo -d "DAT/NES.dat" -i ../Downloads/Nintendo/NES-ROMS -o "../Clean Platforms/Nintendo NES"

andrebrait commented 4 years ago

I didn't even know about the VC releases until I started noticing them when testing yesterday. I agree, I'd prefer to leave them out. Can that be specified with this tool? I have to read the help more carefully. :)

You can use --exclude "Virtual Console" ;-)

I usually also exclude ROMs with GameCube in the name, because those are editions that came in some GC disks.

IAmTheDewd is working on his own sets and he will provide some well tailored commands for generating pretty sane 1G1Rs for most popular platforms when he's done

andrebrait commented 4 years ago

Just for sanity's sake, this is what I have right now for Redump languages and regions - this is used only when there is no language specified at Redump (not very many, but enough that I needed this). I know it's missing Switzerland, Belgium and some others - I'm adding them where appropriate (Switzerland to German and French, Belgium to French, Austria to German)

          'English'   => array("USA","Canada","Australia","UK","Ireland","Europe"),
          'Japanese'  => array("Japan"),
          'French'    => array("France"),
          'German'    => array("Germany"),
          'Spanish'   => array("Spain","Argentina","Latin America"),
          'Italian'   => array("Italy"),
          'Dutch'     => array("Netherlands"),
          'Portuguese'=> array("Portugal","Brazil"),
          'Swedish'   => array("Sweden"),
          'Norwegian' => array("Norway"),
          'Danish'    => array("Denmark"),
          'Finish'    => array("Finland"),
          'Chinese'   => array("China"),
          'Korean'    => array("Korea"),
          'Polish'    => array("Poland"),
          'Russian'   => array("Russia"),
          'Greek'     => array("Greece"),
          'Czech'     => array("Czech"),
          'Hungarian' => array("Hungary"),
          'Turkish'   => array("Turkey"),
          'Arabic'    => array("Arabic")

I would have to check what is the No-Intro convention for these others. They don't use plain country codes, IIRC, so it's hard to know what a game from a region like this will appear in the DAT

andrebrait commented 4 years ago

1.9.1 seems to produce good results and is catching files previously missed, plus selecting some different files for ones that were previously included (need to verify these more closely).

I noted one file missed from NES which I can't explain:

Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)

Parameters: python3 generate.py -r CAN,USA,EUR -l en --all-regions-with-lang --only-selected-lang --prefer-parents --no-bios --no-program --no-enhancement-chip --no-beta --no-demo --no-sample --no-promo -d "DAT/NES.dat" -i ../Downloads/Nintendo/NES-ROMS -o "../Clean Platforms/Nintendo NES"

Is the file the .sav file? It's very weird that such thing is in a DAT tbh

HVR88 commented 4 years ago

I have other tables to translate English names to country and language codes for use in filenames :) The English names are for parsing web/database content from Redump.

That Kid Dracula key I posted was copied directly out of a fresh dat from dat-o-matic. The files scanned are all the zipped no-intro.

andrebrait commented 4 years ago

But is it really missing altogether or did it just give you a warning message?

HVR88 commented 4 years ago

It's missing from the destination. In other words, it didn't copy the file. I don't recall seeing any error, but I might have just missed it.

I can probably run it again to see what happens in the output as I've saved all the command line arguments for each platform to make sure I can re-use them in the future.

andrebrait commented 4 years ago

Could you post your generate.log file after running the command?

HVR88 commented 4 years ago

Here's the log:

WARNING [Dreamworld Pogie (Unknown) (Proto 1) (1993-10-19) (Unl)]: no recognizable regions found WARNING [Dreamworld Pogie (Unknown) (Proto 2) (2016-xx-xx) (Unl)]: no recognizable regions found WARNING: ROM file [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).nes] for candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found WARNING: ROM file [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).sav] for candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found WARNING: candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found, trying next one WARNING: ROM file [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).nes] for candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found WARNING: ROM file [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection).sav] for candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found WARNING: candidate [Kid Dracula (USA, Europe) (Castlevania Anniversary Collection)] not found, trying next one WARNING: no eligible candidates for [Akumajou Special - Boku Dracula-kun (Japan)] have been found! WARNING: ROM file [Castlevania (USA) (Castlevania Anniversary Collection).nes] for candidate [Castlevania (USA) (Castlevania Anniversary Collection)] not found WARNING: candidate [Castlevania (USA) (Castlevania Anniversary Collection)] not found, trying next one WARNING: no eligible candidates for [Castlevania (USA) (Castlevania Anniversary Collection)] have been found! WARNING: ROM file [Castlevania II - Simon's Quest (USA) (Castlevania Anniversary Collection).nes] for candidate [Castlevania II - Simon's Quest (USA) (Castlevania Anniversary Collection)] not found WARNING: candidate [Castlevania II - Simon's Quest (USA) (Castlevania Anniversary Collection)] not found, trying next one WARNING: no eligible candidates for [Castlevania II - Simon's Quest (USA) (Castlevania Anniversary Collection)] have been found! WARNING: ROM file [Castlevania III - Dracula's Curse (USA) (Castlevania Anniversary Collection).nes] for candidate [Castlevania III - Dracula's Curse (USA) (Castlevania Anniversary Collection)] not found WARNING: candidate [Castlevania III - Dracula's Curse (USA) (Castlevania Anniversary Collection)] not found, trying next one WARNING: no eligible candidates for [Castlevania III - Dracula's Curse (USA) (Castlevania Anniversary Collection)] have been found!

andrebrait commented 4 years ago

Well, apparently you don't have the ROM with the correct checksum for those games.

Have you checked that you have them in your input folder?

HVR88 commented 4 years ago

Every ROM except Kid Dracula ends up in the output folder. There are multiple of every Castlevania (others not listed in the error output), but only a single Kid Dracula, so it's possible that one of them matches and there are no matches for Kid Dracula.

If that's the case, then I would suggest an option to allow fall-back to filename or title when no hash match exists. That way at least one ROM for every title should still make it to output.

Files are the current NES Dewd No-Intro set which were updated on March 29 2020.

andrebrait commented 4 years ago

Falling back to filename is not a good idea. If the ROM doesn't match, then it's not the ROM in the DAT at all.

It's more likely that there's a bug in my code. I'm going to get the ROM and run it through my code to see what's happening.

andrebrait commented 4 years ago

I meant if the hash from the file doesn't match the hash in the DAT, then even if the file has a given name, the file is not that ROM.

HVR88 commented 4 years ago

Seeing as there are egregious filename mistakes in the DAT files that completely violate No-Intro naming conventions, I wouldn't be surprised to see hash mistakes from time to time either.

Having it as an options works for people (like me) who know the game is probably the correct item, but don't really care about hash matching. For platforms like NES, hashes are useful for doing cleanup like using your tool, but when it comes to emulation time, sort of irrelevant.

I'd rather not be missing a title, regardless if it's a hash mismatch to what's in the DAT. An option ensures that I can do that, while someone else who doesn't know what they're doing is still restricted for moving unknown files.

But by all means, make sure that if there's a bug in your code it can get squashed - it's why I noted this discrepancy in the first place. :)

mspykerez commented 4 years ago

Option to run both hash check and then filename in this order would be ideal for me, plus a 3rd option for the issue mention here https://github.com/andrebrait/1g1r-romset-generator/issues/10 hows it going ?

HVR88 commented 4 years ago

Option to run both hash check and then filename in this order would be ideal for me, plus a 3rd option for the issue mention here #10 hows it going ?

I'm working on a tool to make robust clone lists for platforms that don't have them at this time accounting for all the pitfalls mentioned in the other issue thread. I can't guarantee it will be able to make a list for every platform (like Amiga for example), but it will for most (PSX, PS2, Wii, etc.) I will figure out a way to make some type of output compatible with Andre's 1G1R tool here.

andrebrait commented 4 years ago

@HVR88 I see your point. You're right. I'm gonna make it also check filenames if a hash match isn't found. Unfortunately, there's just so much one can do when the source of truth is wrong. Parsing languages, regions, assuming default languages per region if no explicit language data is there, etc. are some of these, but for possibly faulty hashes, we really can't do much.

I'm also going to add the scanned files and calculated hashes to the --debug output

andrebrait commented 4 years ago

According to some investigation I did yesterday, the scanning is actually working just fine, and it actually finds the files correctly. The issue is somewhere else, because somehow the candidate still doesn't make it to the right list.

andrebrait commented 4 years ago

Could you please download the latest version from master and see if it works for you?

HVR88 commented 4 years ago

Is "1.9.3 snapshot" the correct version?

Output still missing that Kid Dracula file.

Log file the same as last time.

andrebrait commented 4 years ago

Could you add the --debug option to your command and post or attach the resulting generate.log here?

andrebrait commented 4 years ago

Please use version 1.9.3 or 1.9.4-SNAPSHOT, btw, because I added some things to help with that

HVR88 commented 4 years ago

OK, I'll update again from the master and get that info to you later today. In the middle of a Ubuntu VM install right now to try and set up my own development environment to tweak a libretro core (because I don't think anyone else will do it)

andrebrait commented 4 years ago

Sounds like fun :-)

Which core, btw?

HVR88 commented 4 years ago

vice-libretro.

I haven't really done anything with C in probably over 18 years and don't do much programming in general - maybe a few hundred lines of PHP scripting per year, and every few years a burst of a couple of thousand like this year. I took a look at the source yesterday and think the changes I want are pretty straight forward, one of them includes fixing some terminology that is currently wrong and misleading. But damn, trying to find out what dev environment is recommended/preferrable and how to set THAT up is sort of a nightmare.

mspykerez commented 4 years ago

Funny thing I was trying to get vice_xplus4_libretro to work with IAmTheDewd set but apparently I need a config file for it in /system/vice named 'vicerc', do you mind sharing yours? I'm quite a newbie regarding doing such things myself.

HVR88 commented 4 years ago

I made my own vicerc to do some Vice-x64sc overrides and there's not much to it. Just a plaintext file with attribute=value pairs. vicerc should be totally optional however, but I haven't tried any Plux4 stuff myself. For VIC-20 and C64 all the required attributes/parameters can be set from the core options - enough to run the No-Intro Vic-20 stuff and C64 stuff I have in d64 disk image format.

We should probably chat about this elsewhere of your issue tracking is going to get a little confusing here. :)

Anyway, my vicerc for a couple of tests was simply:

VICIIBorderMode=3 VICIIHwScale=1 KeepAspectRatio=0 TrueAspectRatio=0

And that might be the full contents of the file. Then in the core options make sure to enable reading of the vicerc file.

HVR88 commented 4 years ago

Man, what a nightmare installing this stuff. I'll be taking a break now and checking out your latest changes.

Running out of options and google-searching. After having to resize my vdisk which tool much longer than I expected, I can compile retroarch and the view core for linux but cross-compiling isn't working for windows and I've been stuck on that for over an hour.

HVR88 commented 4 years ago

Latest log file with additional debug: http://salumba.com/generate.log

andrebrait commented 4 years ago

Thanks fr that. It's a bug. It detects your files just fine, and selects it just fine, but for some reason it can't copy it.

So, let me work it out the why and I'll submit a fix

andrebrait commented 4 years ago

Actually, my mistake. The hashes are indeed different. They changed them recently on the DAT files.

The hash of the ROM we have (and which was there in the older DATs) is 69ccadce379457f5d08fa4f6f81c9dc2b167e215, whereas the one in the new DATs it is 618ac6835b96bb5ebfc57dd3b828fafbb0e0fc7d. But the weirdest part is that now there's that .sav file that just can't be right.

It's all fine with the script, but indeed, I'll add match by filename as a fallback.

andrebrait commented 4 years ago

It's actually right. The new ROM actually does need the sav file.

See the discussion here.

So, the old version is actually a bad ROM. Still, I'm gonna add the feature, but we should have the sets updated in the future when IAmTheDewd release a new set.

HVR88 commented 4 years ago

I should have actually tried that Kid Dracula first, shouldn't I? :) Verified it doesn't work after you posted yesterday.

Great to hear it'll deal with falling back to files. I have to check if I'm done filtering all my existing cart platforms, but I know I'll have some non-hash clone lists to make into DATs pretty soon for disc systems that I'd like to rip through with this tool.

By the way, did you get the Plus4 core running content? I'm going to need to test that one out myself maybe next week. I spent much of today making a lot of additions to the Vice-libretro UI code to allow manipulating/removing the borders on the C64, plus custom cropping - when it's done in the next couple of days, I'll have to extend the support to Vic-20 and Plus4 in the same codebase.

mspykerez commented 4 years ago

Is the filename fallback already implemented on the newest version or do I need to run the script twice ?

andrebrait commented 4 years ago

Unfortunately not. I'm working on a complete rewrite of it that's going to be much more capable and easy to maintain/add functionality. It will be there, but it's not currently done.

mspykerez commented 4 years ago

Any ETA ? :)