RetroPie / EmulationStation

A Fork of Emulation Station for RetroPie. Emulation Station is a flexible emulator front-end supporting keyboardless navigation and custom system themes.
Other
856 stars 344 forks source link

Scraper seems to have issues game names containing unusual characters #69

Open peabnuts123 opened 7 years ago

peabnuts123 commented 7 years ago

Aloshi/EmulationStation#259 Came across this issue when looking up some symptoms related to the scraper not being able to find very famous games like F-Zero X (Nintendo 64). It seems to be describing the exact symptoms / cause of the problems, but died with the repo. As far as I can tell, this issue still stands, I have trouble scraping any of the pokemon games, or F-Zero X for N64.

Anyone have any thoughts?

HerbFargus commented 7 years ago

Use sselphs scraper.

https://github.com/retropie/retropie-setup/wiki/scraper

Built in scraper is unlikely to be fixed anytime soon if ever.

peabnuts123 commented 7 years ago

I see, this is very cool. Should the built-in ones be replaced / removed then ... ?

zigurana commented 7 years ago

No, because @sselph has indicated that he would rather maintain the scraper as a standalone program, which is a LOT let hassle to work with than having to test and compile as an integral part of ES. Maybe if we can agree on a fixed interface/API, we could have ES make those calls, but even that will require significant effort from an non-existing dev-team.

Alternatively, we could just remove the native ES scraper, as it is really flaky, but that will result in others complaining, no doubt.

HerbFargus commented 7 years ago

To be fair he has made some strides on an ES integration but whether or not it will happen anytime soon is to be determined. He's got a lot on his plate as it is.

https://github.com/RetroPie/EmulationStation/issues/50

zigurana commented 7 years ago

There is always hope! :-) :+1:

ebTalk commented 7 years ago

Has anybody been able to get selph's scraper to work with special characters or three letter games? I can't get it to work with WWF Raw or NBA Jam or NHL 94

robertybob commented 7 years ago

@ebTalk It works by looking up hashes, not names, so not sure why it's not working for you?

ebTalk commented 7 years ago

@robertybob what do you mean by hashes?

joolswills commented 7 years ago

@ebTalk this is the wrong place for help with SSelph's scraper - this issue is about the built in scraper in Emulation Station. You can report issues with SSelph's scraper here - https://github.com/sselph/scraper or on the RetroPie forum.

sselph commented 7 years ago

I have made some progress in updating the ES scraper to use my code. Once I get the fully automated scrapers added to ES and working. I'll try updating the current builtin one to use a hash match then fallback to search. When I do this, I will see if I can resolve this by striping out characters it doesn't like or using GetGameList.

peabnuts123 commented 7 years ago

It's great to hear this is progressing @sselph I would love to see your scraper in ES or at least the philosophy of your scraper imbued into the ES scrapers.

DavidinCT commented 7 years ago

Is there an option to replace the scraper that could be plugged into the Windows version ?

I have been trying to scrape over a week and getting endless timeouts. 200 games is taking forever and I might have about 30 done over a week.

Please say there is an option....PLEASE !

lualebu commented 7 years ago

I've been playing around with thegamesdb API and I've made some interesting observations.

The ES scraper works by passing a name and platform field to the API, like so:

http://thegamesdb.net/api/GetGame.php?name=Mario&platform=Nintendo Entertainment System (NES)

Unfortunately, the API is very finicky about the format of the name field. If the query contains special characters (F-Zero), or the query doesn't contain any words longer than 3 characters (Dig Dug), then the API won't return any search results. Presumably this is because of the way that ElasticSearch is configured on thegamesdb.net.

A simple work around is to just query the game by it's ID, like so:

http://thegamesdb.net/api/GetGame.php?id=16994&platform=Nintendo Entertainment System (NES)

This works pretty well, but it would be kind of irritating to program it as an exception with the current scraper implementation. Fortunately, there might be a cheap workaround.

If you pass both the name field AND the ID field to the API, and set the ID to 0, the ID field is ignored and you get your normal search results:

http://thegamesdb.net/api/GetGame.php?name=Mario&platform=Nintendo Entertainment System (NES)&id=0

If you pass both fields to the API, and set the ID > 0, the name field is ignored and the correct game is returned:

http://thegamesdb.net/api/GetGame.php?name=Mario&platform=Nintendo Entertainment System (NES)&id=16994

Using this method, we could add an option to search by ID by making some minimal changes to the scraper logic:

MotaDan commented 7 years ago

The GetGame API has an exactname parameter. When this parameter is used it does not care about special characters or word lengths. Riffing off of @lualebu, when the user uses the input field the string "&exactname=" should be added to the beginning of what they enter. This forces the name field to be left blank and causes a direct search of the string. This should allow the user to find titles manually and use a different search method to find games that didn't come up automatically. This provides a simple workaround with minimal code changes and only happens when the user causes it. Some examples: Broken F-Zero http://thegamesdb.net/api/GetGame.php?name=f-zero&platform=Super%20Nintendo%20(SNES) Fixed F-Zero http://thegamesdb.net/api/GetGame.php?name=&exactname=f-zero&platform=Super%20Nintendo%20(SNES) Broken Dig Dug http://thegamesdb.net/api/GetGame.php?name=dig%20dug&platform=PC Working Dig Dug http://thegamesdb.net/api/GetGame.php?name=&exactname=dig%20dug&platform=PC Broken NBA JAM http://thegamesdb.net/api/GetGame.php?name=nba%20jam&platform=Arcade Working NBA JAM http://thegamesdb.net/api/GetGame.php?name=&exactname=nba%20jam&platform=Arcade

zefie commented 7 years ago

I think using exactname would be better with a prefix, such as "exact:f-zero" in the input box. or perhaps the other way around, "fuzz:mario" to use "name=mario". Forcing one or the other is a pain in some situations where you have to know the exact name as entered into TheGamesDB.

For example, say you have crisiscore.iso in your psp folder. The exact name is "Crisis Core: Final Fantasy VII". We could either use "exact:Crisis Core: Final Fantasy VII" or "fuzz:Crisis Core" to search it, depending on which way we go, but not having the option to switch between fuzz (name=) and exact (exactname=) is a little bit of a pain :)

Edit: The reason I propose "exact:" as a prefix, is because the original used fuzz search as you know, and users are used to this. The newly committed "id:", or a future added "exact:" could be used in the rare instances mentioned in this issue.

MotaDan commented 7 years ago

I don't think using a special prefix to activate exact is a good idea. The whole point of the input field is as a fall back from when the automatic search fails. If it by default uses the same behavior as the automatic scrapper, there's no point.

The flow currently in the worse case is auto fails, they type in a name that isn't exactly correct and nothing comes up, they look up the name and type it in correctly bingo done. If it goes back to being fuzzy the flow is auto fails, input also fails and it is impossible to find your game, they tinker with the input trying to get it to work, maybe changing the file name will fix it but no, they get the exact name from the games db and it still doesn't work, after some searching they find out about sselph scraper and get to scrape for the info again, making everything until then a waste, if they're especially unlucky the game's hash isn't in sselph's database and they have to add the info manually to the gameslist or they find this and use exact or id.

Ideally it would use both search types with the file name and with the input, but the simplest solution was to just have one be fuzzy and one exact. This way the user doesn't have to go so far into the weeds to get a complete list. The crisis core problem is nicely fixed by using id. I don't think much is gained by adding fuzzy as a prefix option.

zefie commented 7 years ago

If it by default uses the same behavior as the automatic scrapper, there's no point.

If your ROM name is smas+w.smc, its not going to know that smas+w = "Super Mario All-Stars + World", and auto will fail, allowing the user to search for "Super Mario World". The fuzz would bring up the original SMW, as well as All Stars in the list. The point of the manual search was to find a game when the ROM name is too obscure. We shouldn't force users to use special filenames.

The flow currently in the worse case is auto fails, they type in a name that isn't exactly correct and nothing comes up, they look up the name and type it in correctly bingo done.

This adds an extra step for the user that could be prevented if we used the default behavior.

If it goes back to being fuzzy the flow is auto fails, input also fails and it is impossible to find your game.

Maybe we could add a message in EmulationStation alerting the user of the presence of "exact:" and "id:" if and only if the results come up empty.

find out about sselph scraper and get to scrape for the info again

I appreciate sselph's efforts but I have had 0 luck with his tool. Finding a file with the correct hash is a pain. I spent a good 2 hours finding the "dc_flash.bin" that retropie-manager wanted for BIOS. For larger files (and self rips) like PSP, PSX, etc, the hashes are likely never to match, particularly with ISO dumpers that may add custom metadata, such as the date it was dumped.

This way the user doesn't have to go so far into the weeds to get a complete list.

The current setup of forcing search to be exact vs fuzz leads the user to have to use a web browser on an external machine to figure out the game's exact name as entered in TheGamesDB's database, every single time. A fuzz actually would make it less effort, only sending the user "far into the weeds" if it doesn't find it.

I don't think much is gained by adding fuzzy as a prefix option.

Honestly me either, I'd prefer fuzz to be default, and "exact:" and "id:" to be prefixes for advanced cases such as those listed here.

joolswills commented 7 years ago

If you all think it's better to make fuzz the default again we can do that (and include exact search as a manual prefix etc). I don't use the built in scraper much.

zefie commented 7 years ago

I don't use the built in scraper much.

Unless we can figure out a way to enable the gamepad to work with dialog based menus such as retropie-setup.sh, I believe the internal scraper has value.

Imagine a use-case where the user's only input method is the gamepad.

While I do have keyboard access, this is what I am aiming for. A full game-console experience.

But that is a whole other issue unrelated to this one.

Edit: I'm tired and realize that makes no sense since you need a keyboard to search in the internal input field. But that is also another issue for another time.

joolswills commented 7 years ago

Any feedback on this from anyone? At least one user has complained on the forum the search is too strict now and doesn't find stuff. I wonder if we should put it back as it was, but include a prefix you can use for an exact search ?

pjft commented 7 years ago

I don't use the built-in scraper much, but I could entertain the thought of using fuzzy search (whatever our definition of that is) by default and using double quotes as the standard search delimiters for exact search. On Wed, 2 Aug 2017 at 19:34 Jools Wills notifications@github.com wrote:

Any feedback on this from anyone? At least one user has complained on the forum the search is too strict now and doesn't find stuff. I wonder if we should put it back as it was, but include a prefix you can use for an exact search ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RetroPie/EmulationStation/issues/69#issuecomment-319759521, or mute the thread https://github.com/notifications/unsubscribe-auth/AVAV7W3QXNsxNjKQXo-oCri9ZDCSrA0Wks5sUMEqgaJpZM4KlukC .

MotaDan commented 7 years ago

My plan is to combine both search methods. Exact match first, then the fuzzy results follow. Alternatively I like the double quotes suggestion to trigger an exact search.

pjft commented 7 years ago

Oh, that also works. Thanks. On Wed, 2 Aug 2017 at 20:05 Daniel notifications@github.com wrote:

My plan is to combine both search methods. Exact match first, then the fuzzy results follow. Alternatively I like the double quotes suggestion to trigger an exact search.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/RetroPie/EmulationStation/issues/69#issuecomment-319767849, or mute the thread https://github.com/notifications/unsubscribe-auth/AVAV7QYi_vsg3EwLIw8xYGZbCAbXm3lqks5sUMiEgaJpZM4KlukC .