Gemba / skyscraper

Powerful and versatile game data scraper written in Qt and C++.
https://gemba.github.io/skyscraper/
GNU General Public License v3.0
46 stars 14 forks source link

`aliasMap.csv` is Not Respected for ScreenScraper Scraping Module #10

Closed retrobit closed 5 months ago

retrobit commented 8 months ago

Describe the bug

With an alias entry in aliasMap.csv (located at /opt/retropie/configs/all/skyscraper/aliasMap.csv, when scraping, ROM entry is not found even though alias name is confirmed working for another successfully scraped ROM

To Reproduce

  1. Have a unique ROM that can't be found (CRC checksum does not match as an entry in the appropriate scraper module(s) and can not be found by similar name via fuzzy matching): Mother 25th Restoration Hack.zip (located at: "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/Hacks/Mother 25th Restoration Hack/Mother 25th Restoration Hack.zip")

  2. Have an alias defined in aliasMap.csv:

    #
    # Format (without the quotes):
    # "Rom Filename (Europe);Use This Name Instead"
    #
    # Add your lines below this comment:
    Mother 25th Restoration Hack;Mother (Japan)
  3. Run Skyscraper in scrape mode for the appropriate system/platform and scraping module: skyscraper -p nes -s screenscraper OR skyscraper -p nes -s screenscraper "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/Hacks/Mother 25th Restoration Hack/Mother 25th Restoration Hack.zip" OR skyscraper -p nes -s screenscraper -i "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/H acks/Mother 25th Restoration Hack"

Expected behavior

ROM entry will be found, scraped, and stored in cache, ready for game list creation, similar to the ROM with the name the alias uses, e.g. Mother (Japan)

Special circumstances

Custom config.ini (located at /opt/retropie/configs/all/skyscraper/config.ini):

[main]
inputFolder="/home/pi/RetroPie/ROMs"
gameListFolder="/home/pi/.emulationstation/gamelists/"
forceFilename="true"
brackets="no"
videos="true"
unattend="true"
skipped="true"
regionPrios="us,wor,eu,jp,ss"
langPrios="us,jp"
...
[nes]
inputFolder="/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System"
mediaFolder="/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System"

Terminal output

pi@retropie:~ $ skyscraper -p nes -s screenscraper "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/Hacks/Mother 25th Restoration Hack/Mother 25th Restoration Hack.zip"
------------------------------------------
Running Skyscraper v3.9.2 by Lars Muldjord
------------------------------------------
Platform:           'nes'
Scraping module:    'screenscraper'
Input folder:       '/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System'
Game list folder:   '/home/pi/.emulationstation/gamelists/nes'
Covers folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/covers'
Screenshots folder: '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/screenshots'
Wheels folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/wheels'
Marquees folder:    '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/marquees'
Textures folder:    '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/textures'
Videos folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/videos'
Cache folder:       'cache/nes'

DID YOU KNOW: If you've manually compressed your roms (zip or 7z), you can use the '--flags unpack' flag to tell Skyscraper to checksum the roms inside the compressed file instead of the compressed file itself. This is only relevant when scraping with the 'screenscraper' scraping module.

Fetching limits for user '<REMOVED>', just a sec...
Setting threads to 1 as allowed for the supplied user credentials.

Reading and parsing quick id xml, please wait... Done!
Reading and parsing resource cache, please wait... Done!
Successfully parsed 60759 resources!

Looking for optional 'priorities.xml' file in cache folder... Found!
Priorities loaded successfully!

Trying to parse and load existing game list metadata... Success!

Starting scraping run on 1 files using 1 threads.
Sit back, relax and let me do the work! :)

#1/1 (T1) Pass 1 ---- Game 'Mother 25th Restoration Hack' not found :( ----

#1/1, (0/1)
Elapsed time   : 00:00:07
Est. time left : 00:00:00

---- Resource gathering run completed! YAY! ----
Writing quick id xml, please wait... Done!
Writing 60759 (0 new) resources to cache, please wait... Done!

---- And here are some neat stats :) ----
Total completion time: 00:00:12

Total number of games: 1
Successfully processed games: 0
Skipped games: 1 (Filenames saved to '/home/<USER>/.skyscraper/skipped-nes-screenscraper.txt')

pi@retropie:~ $ skyscraper -p nes -s screenscraper -i "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/H
acks/Mother 25th Restoration Hack"
------------------------------------------
Running Skyscraper v3.9.2 by Lars Muldjord
------------------------------------------
Platform:           'nes'
Scraping module:    'screenscraper'
Input folder:       '/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/Hacks/Mother 25th Restoration Hack'
Game list folder:   '/home/pi/.emulationstation/gamelists/nes'
Covers folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/covers'
Screenshots folder: '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/screenshots'
Wheels folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/wheels'
Marquees folder:    '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/marquees'
Textures folder:    '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/textures'
Videos folder:      '/opt/retropie/configs/all/skyscraper/media/Nintendo - Nintendo Entertainment System/videos'
Cache folder:       'cache/nes'

DID YOU KNOW: You can force a refresh of the locally cached data using the '--refresh' option. Skyscraper will then refetch the requested entries from the scraping sources, instead of loading it from cache. Sort of like Ctrl+F5 in a browser.

Fetching limits for user '<REMOVED>', just a sec...
Setting threads to 1 as allowed for the supplied user credentials.

Reading and parsing quick id xml, please wait... Done!
Reading and parsing resource cache, please wait... Done!
Successfully parsed 60759 resources!

Looking for optional 'priorities.xml' file in cache folder... Found!
Priorities loaded successfully!

Trying to parse and load existing game list metadata... Success!

Starting scraping run on 1 files using 1 threads.
Sit back, relax and let me do the work! :)

#1/1 (T1) Pass 1 ---- Game 'Mother 25th Restoration Hack' not found :( ----

#1/1, (0/1)
Elapsed time   : 00:00:08
Est. time left : 00:00:00

---- Resource gathering run completed! YAY! ----
Writing quick id xml, please wait... Done!
Writing 60759 (0 new) resources to cache, please wait... Done!

---- And here are some neat stats :) ----
Total completion time: 00:00:13

Total number of games: 1
Successfully processed games: 0
Skipped games: 1 (Filenames saved to '/home/<USER>/.skyscraper/skipped-nes-screenscraper.txt')

Technical information

retrobit commented 8 months ago

FYI: I'm currently looking into the code to see what is going on, and will be attempting a fix. Brushing off my C++ skills and downloading Qt now 👨‍💻

Gemba commented 8 months ago

Thanks for the precise report.

However, I can not find a hint for your claim in the sources. This is the method where the alias gets evaluated. Be aware that in the process anything in brackets/parenthesis is cut off.

To isolate the issue: Please stash the aliasmap.csv (or comment the line you added with a heading #) then run skyscraper -p nes -s screenscraper --query="romnom=Mother" "/home/pi/RetroPie/ROMs/Nintendo - Nintendo Entertainment System/Hacks/Mother 25th Restoration Hack/Mother 25th Restoration Hack.zip"

Note the --query="romnom=...".

If this does not return a hit, then there is no entry for that homebrew/hack game.

If you want to vet if aliasmap.csv is ignored, try this:

Put a new sample ROM in a rom folder from which you know Screenscraper has data for. Rename that rom to something obscure, use that obscure name in the aliasmap.csv as first part (before the ;) and then put a title that is known to Screenscraper as second part. You may also use Mobygames (but then it is only --query="Mother") for this testdrive. If there is with this approach no match, we may have an issue with Skyscraper, else the root cause is missing data.

NB: I use the import scraper to maintain scraping data for very special, less popular ROMs.

retrobit commented 8 months ago

Got some time to look into this a little more. A few things:

Some interesting behavior with additional manual testing:

So I'm wondering if the behavior of Skyscraper needs both the name and CRC to match, in order to scrape.

Could you confirm this? Is this behavior only applicable to screenscraper scraper source module? Also, is -m <0-100> flag still respected in this case? I was under the impression that the behavior was: match on CRC, and if no match, match on name with default % or % defined with -m flag.

Finally, the "Result title" is "Earthbound" here, but it matches when querying with "Mother". Does ScreenScraper consider regional variants for name matching?:

#1/1 (T1) Pass 1 ---- Game 'Obscure Name' found! :) ----
Scraper:        screenscraper
From cache:     NO
Search match:   100 %
Compare title:  'Obscure Name'
Result title:   'Earthbound' ()
Platform:       'NES' ()

I would like to avoid using the import scraper, as it's a completely manual process for entries that already exist in ScreenScraper: it would make sense for the user to specify the entry in the ScreenScraper database that they would like any file scraped as.

retrobit commented 8 months ago

Looking at the code, I can see that AbstractScaper::runPasses() (from abstractscraper.cpp) calls ScreenScraper::getSearchNames() (from screenscraper.cpp which overrides abstractscraper.cpp)

I can see that if no --query is provided, that Skyscraper searches by these values: CRC, MD5, SHA1, ROM name, ROM size from ScreenScraper. If --query is provided, it searches by that only, which explains some of the above behavior.

So for my use case, it make sense to use the --query optional parameter to essentially force the search with ScreenScraper to result in what game/ROM I have, but again, this is not ideal as I can only do it one ROM at a time and automating it through a script is less than ideal.

I will look further into:

retrobit commented 8 months ago

ScreenScraper's website UX is abhorrent, but looking at the API v2 docs, and comparing Skyscraper's jeuInfos endpoint request from ScreenScraper::getSearchResults() (from screenscraper.cpp) I see what's going on.

Skyscraper supplies the platform and all of the values (CRC, MD5, SHA1, ROM name, ROM size) to check for a thorough match. This explains existing behavior as a unique ROM hack not in the database would not match any of these criteria. If one or more of the values was forced to something that matches via --query, only those are used instead.

Which, finally, leaves only aliasMap.csv behavior to be explained: Searching the code for aliasMap.csv, we can find where it is read and assigned to config.aliasMap. This value is only used for scraper modules that return multiple potential GameEntry objects - ScreenScraper is not one of them, it is a "direct match" source. ScraperWorker::getBestEntry() along with ScraperWorker::getSearchMatch() then handle this and match on some heuristics, if not interactive, default at 65% match or what was overridden with -m <0-100> flag.

aliasMap.csv is not for my use case!

I think we can make documentation more clear on not only general behavior of scraping, and scraper-module-specifics, but at the very least what the aliasMap.csv accomplishes and what workflows it's used in.

Using aliasMap.csv differently for scraper source modules that don't return multiple GameEntry objects, or alternatively, creating another xxxMap.csv that is used to specify --query="xxx=..." would be a great way to support this feature. If there is a game that isn't matching for some reason, but we know/have an idea how to query the right match from the database using the API, we could just fill in these rows in the .csv file and Skyscraper would take care of it. This is especially helpful if one wants to use only one scraper source module that doesn't support the existing aliasMap.csv behavior.

retrobit commented 8 months ago

Also, about regions, for SceenScraper, if a region is found in the filename, this overrides any region priorities set in configuration. If no region is found in filename, it will look in order based on region priorities set, or by default region priorities if they aren't set.

For anyone who was wondering/confused, like me, entries in the ScreenScraper database are tied to a unique platform and game ID. This game ID is tied to different game data, like boxart, title, description, etc. by region and/or language. This means that regardless if the CRC matches a region-specific ROM, it will match a general game ID. Skyscraper then infers region based on ROM name with "()" pattern, and if not found, will use region priorities to determine which data to retrieve. Language priorities will return data like title and description in preferred language. Contributors and proposals on ScreenScraper determine if a game is considered "Mother" or "Earthbound" in English. Japanese titles unreleased in English-speaking regions will most likely not have a translated name, but, it's up to contributors/proposals/mods.

Gemba commented 8 months ago

Thanks for digging up the information. That is some use case not anticipated in the past. However, maybe we can implement in a elegant way.

Maybe I miss the point but would it help to allow esp. for Screenscraper (either in the current aliasmap or in a new file) to allow providing <romfilename without ext>;gameid=<n> to explicitly guide Skyscraper to use this ROM information from Screenscraper?

retrobit commented 8 months ago

That's my initial thought, but I'd like to give this more thought on implementation and I will update here

retrobit commented 7 months ago

I have a code-complete solution that I need to compile and test. I will create a PR with my approach when it is ready.

Gemba commented 5 months ago

Closing this as #45 which handles it is merged.