Open iconmaster5326 opened 1 year ago
Another wrinkle regarding locales, should anyone try to implement this: The language a card is printed in may not be the locale the set was released in, and any given set may contain cards of multiple different languages. Case in point: https://yugipedia.com/wiki/The_Valuable_Book_5_promotional_cards.
See DawnbrandBots/yaml-yugipedia#2. There are a number of things to be resolved. For one, I'm merging all of Yugipedia's English sets into one "sets.en" array, but there's actually significant regionalization, at least historically (North America, Europe, Oceania, Worldwide), which is why Yugipedia has separate en_sets
, na_sets
, and so on.
Oh no, I've made a duplicate issue! I don't know how I missed you talking about this already. Yeah, I can imagine this is no easy task, considering the great historical muddling of locales, and the general non-alignment of languages to locale.
It's not exactly duplicate since there's no tracking issue in this repository, and I also haven't really started to document the use cases and caveats that need to be worked out. Typically the main goal is just first and last release date, which we do need for Bastion, but most approaches mask over different release timings by region (TCG release date just becomes US release date, OCG release date just becomes Japan release date). I was unaware that the same name could refer to different sets, or the VB5 language issue, so thanks for bringing that up.
My thoughts, if they're worth anything at all: Luckily, if you're scraping the Yugipedia card pages to get printing information, you have the easy ability to correspond cards to sets, cleanly avoiding the above issues, by just looking at the link given in each "set" column. If the trouble with set information is just rounding up a full list of every set ever, you could even use those links to enumerate all sets that have ever had a card printed in them. (this method would not catch sets without cards in them, but... uh... yeah). You can even avoid scraping the set page itself for things like dates; you can just review all the dates printed in the printing tables for any given link... Although if you find a set whose individual cards don't agree on date, good luck sorting that out.
We're not supposed to be scraping the pages themselves, but instead obtain the wikitext from the API, which means some of the logic we have to reimplement ourselves. That's why the current plan is to recurse on Category:Sets and hopefully nab everything, then go from there to try to assign dates. This is close to what the release table template does on wiki.
Ah, right, the wikitext... Yugipedia just lists a list of set names and hopes for the best, automatically making links, instead of manually disambiguating. And so their problem of potential inaccuracies is now our problem.
EDIT: it actually looks like the string they use there corresponds 1:1 to a page name, no heuristics needed. So you're safe using that to disambiguate. Phew!
Dates can also be obtained from the official card database at https://www.db.yugioh-card.com
I quite love that this data set has information about what sets what cards appeared in regardless of locale. A lot of Yugioh tools I've found care only about the TCG, and usually only the English TCG at that. However, I've run into a big issue when trying to pin cards to certain sets, considering all possible locales:
Furthermore, there is some data I'd like regarding sets, such as when a set first came out in each locale. Have you considered making an additional dataset for card sets?