PokeAPI / pokedex

PokeAPI's fork for adding gen-8 data.
https://github.com/PokeAPI/pokedex/tree/master-pokeapi/pokeapi
MIT License
43 stars 38 forks source link

Adding some more Gen8 Data #5

Closed RyanVereque closed 3 years ago

RyanVereque commented 4 years ago

Started new thread following this Brief work write-up

Hello again @Naramsim . By the way I'm new to the world of open-source contribution so bear with me if I ever ask silly questions !

First I'd like to ask a general questions about new data.

Some CSV/Tables follow a general policy of assigning some ID's starting at 0, others starting at 10000. For example pokemon and pokemon_forms for which the is_default column is false. I'd keep this trend, but I was wondering if a similar idea should be applied for the IDs of things that are added in this fork. I.e making new records in pokedexes start after some large number (10000) if they are newly added (10026 instead of 26 for galar pokedex following 25 for updated-poni). Or maybe some other large number because this conveys newness rather than alternativeness.

Also, just FYI my scraping does not rely on visiting any individual Pkmn pages on Bulbapedia , rather just the various list pages with specific info associated with each mon (in a given gen or other category). That's why (if you check the write-up) many of columns for some tables are empty (i.e my pokemon). Is that okay ? Of course a more comprehensive scraping approach than mine in the future would be able to fix all those NULLs :)

Secondly I wanted to ask about the file structure / repos. For this fork if doing CSV edits only, would I only modify under pokedex/data/csv/ ? (without touching the scraping directories that also have csv files) ? Also, how does the PokeAPI/pokeapi repo use a copy of the above mentioned directory from this fork ? Does one just make sure identical csv pulls are accepted simultaneously to both repos ?

My action plan atm is to rebuild the DB using csv from this fork and re-run my scraping code, and seeing what, if any, duplicates/contradictions it makes. Then update the logic for detecting existing-records so it complies with this fork. Then finally work on dumping tables to csv.

RyanVereque commented 4 years ago

Just a clarification, I do not intend to over-write any records from csv edits made in this fork already, only add records or fill in missing values. The talk of more comprehensive scraping in the future applies to CSVs that have not yet already been touched by this fork, but that I'd have some data for in the pull.

Naramsim commented 4 years ago

Hi, to answer the first question: We will just go incrementally. So Generation 8 has the id: 8, not 10008. We use 10000 only when we know there will be a gap in between. Take an example: they've released some new moves, there are no IDs for those online, so we will go alphabetically starting from the ID of the last move.

Mid point: We would really like to have the files filled as much as possible. Ideally a script for a CSV file, so we can re-run it again and there are no conflicts because we are not running two scripts on the very same file. - At Pokeapi, if there is a null value we also need to update our database schema, allowing that particular field to be null. But that's done over at pokeapi.

Second point: yes. Data goes to /pokedex/data/csv/, any helper scripts go to /pokeapi/scripts. - At pokeapi, when we need to update the data we take if from veekun, and copy-paste it there (no submodules). Now with this fork, we will take the data from here. So yes, ideally two PR with the same CSV data.

Final point: up to you. In the end, we are interested in updated CSV files with data coming from a trustful source. I was thinking of going with simple scripts since it's easier to understand what's going on.

Naramsim commented 3 years ago

stale, closing