Closed Zayelion closed 9 years ago
The issues addressed here are about asynchronous. We want the server to process ONE page at a time fully and save it out before moving to the next one. If it prints all of them at once the server will open 7000+ connections and then crash/stop the server while it is processing that.
So this will require a code review to make sure it is properly asynchronous.
any update on this?
https://github.com/SalvationDevelopment/YGOPro-Support-System/blob/master/server/libs/carddb-update.js @me-stefan-baiu wrote most of this but due to school and stuff he stopped. There are a few parsing issues, if I remember correctly. I'm hoping he finishes it by the end of the year, that would be great. He sorta dropped off the earth. We are still doing it by hand atm.
Wikia JSON dump: result.zip
Well, thats the first step next is marshaling it into the cards.cdb for each language. That's the part @me-stefan-baiu was working on, understanding how YGOPro deals with each piece of data, tokens, alternate card arts, and so forth. Then unwinding the JSON data correcting inconsistencies and then updating the card file once he had verified that the logic was correct. This needs to be done over all the languages and into separate files.
Known limitations:
I "think" this is good enough for the moment, but feel free to keep tweaking it to over come the alternative artworks limitation. An output of the name only translations to pass off to the "YGOrganization" or the general public would be helpful to fill in the missing data via interested parties; I'm sure it's extensive.
Cards without official ID's not being translated is fine. This keeps the card release for Salvation under DevPro but above or with DuelingNetwork which I am OK with.
I'll help out more with it in the future because I have to learn Go
for work. There is still some manual work for updates that has to be done but I think it's fine.
An output of the name only translations to pass off to the "YGOrganization" or the general public would be helpful to fill in the missing data via interested parties
The command prints the ID and name of the cards, you can filter the output with:
translatedb -db cards-es.cdb -lang es | grep incomplete > log.txt
I think that GitHub shell includes the grep
command.
After testing and reading over https://github.com/adrianojn/translatedb/blob/master/translate.go dbUpdate(id, name, lore string)
needs some additional(or less) logic to update partials. For figuring out alts do id--;
except on tokens, I forget the logic used for those at the moment. Asian languages do not parse do to the ruby markers but thats ok at the moment.
Asian languages do not parse do to the ruby markers
Done
For figuring out alts do
id--;
Is not enough, some cards have more than 2 artworks. Should I consider consecutive IDs as same artwork? Which should be the limit?
This query works:
UPDATE texts SET name=?, desc=? WHERE id>=? AND id <(? + 9);
But 9 may be too much and overlap with a real card.
If it fails look up by name?
Binaries at http://gobuild.io/github.com/adrianojn/translatedb (v1.3)
Compatible with 10 artworks and never overlap with real cards.
That is the most glorious website I have ever seen. Right after github.
what is still missing?
It skipped cards like Giant Solider of Stone
during testing for es
. Looking over the code when it gets incomplete data it just dumps it. Also in the code that generates the wikia.json is locked to the en
wikia. The es
wikia has far more data, and I think they are other wikias also.
Giant Soldier of Stone doesn't have a es
translation (lore). Should I change to update partials?
The others wikias are very poorly. They don't handle redirects and uses custom tags like: descripción (es_lore on English wikia), Beschreibung (de_lore).
Yes you should update partials. You could create a configuration file and split the work out for the custom tag mapping via a JSON file. We want to pull as much information down as possible.
The biggest problem is the redirect system. The correct way to get data from wikia is by card title, not by card id. The english wikia redirects the card id to the correct page but the others don't.
I will remove the "updating" log message, the program will log only the incomplete cards
What about running a different query that dumps legal cards for OCG and TCG into a JSON file and then loading that JSON file into a into an in memory NoSQL DB and doing the processing with slightly different ID based queries?
Sometimes I think you don't evaluate the system complexity.
running a different query that dumps legal cards for OCG and TCG
This requires a better knowledge of wikia api.
doing the processing with slightly different ID based queries
This is about the tranlatedb
or wikia
command?
Sometimes I think you don't evaluate the system complexity.
Yeah this is likely true, but it looks like an unmaintained mess than intentional complexity. I keep making alot of assumptions about the wikia's functionality, looking at it again I see your point.
I was thinking if you could get a JSON download from an alternative wikia with the wikia
command (using an unspecific method) you could use NoSQL to map it into the card DB.
The English one works fine btw. And sorry, its not my intention to be critical or frustrate you. You are a huge help!
To download from other wikia only this line needs to be changed to something like http://es.yugioh.wikia.com/api.php
, but still is wikia responsibility to do the id -> name
mapping.
To parse the language-specific JSON only this line needs to be changed (use en
as fake language).
The difficulty is really the wikia api, being title-based instead of id-based was a poor design decision. I'll try to get the pages by OCG&TCG lists but it looks very complicated.
I found a way to list all TCG an OCG pages, cmcontinue
is used to paginate.
For now I identified this work flow:
I think that the wikia
and translatedb
will merge.
Should the specific wikias be the only source? If no, which should have preference?
Yes, that keeps things simple I feel.
I noticed one bug on traslatedb
: pendulum text is ignored.
Which format should be used to join card text and pendulum text?
'[Pendulum Effect: ]' + pendulum_text + '\r\n[Monster Effect: ]' + Lore
Done (v1.2)
Only es, fr and de wikias are supported. Italian wiki is bad formatted and the others are plain text.
Stefan is working on something again.
Create an JS function that connects to server/http/ygopro/card.db within the file structure from server/libs/carddb-update.js. (create the new file) Have the file print to console.log the id of each field in the table one at a time in an asynchronous manner.