PokeAPI / pokeapi

The Pokémon API
https://pokeapi.co
BSD 3-Clause "New" or "Revised" License
4.05k stars 921 forks source link

Change API to have only one Pokemon endpoint, instead of pokemon_species, pokemon_forms and pokemon #1037

Open GreatNovaDragon opened 4 months ago

GreatNovaDragon commented 4 months ago

As #1026 , #1034 , #1035, #940, #966 and probably others make it apparent, a separation of pokemon and pokemon_species has not been feasible since 2016.

An Base assumption that veekun did for the pokemon species, which was valid before Sun and Moon introduced regional variants was that Forms are a rare occurance with no pokedex entries or own evolution lines.

Therefore, for gameplay only things, the pokemon model was created and for dex purposes, the pokemon_species model was created.

Since Sun and Moon, gamefreaks handling of forms massively changed. Beginning with Sun and Moon, and expanded in Sword and Shield, Regional Variants have been introduced, which are basically an own species but with the same dex number and name, but in the case of things like Paldean Wooper or Galarian Meowth, evolve into an entire different species than the "Base" pokemon, in the case for Wooper it doesnt evolve into Quagsire but Clodsire.

Additionally to Regional Variants existing and having their own dex entries, Forms itself gained their own Pokedex entries, so the base assumption of "Forms not having pokedex entries or evolution lines" is just wrong now.

Therefore, i would officially propose to change the pokemon model, which would warrant a V3 or an V2.5. the pokemon model would gain most of the info that is in the pokemon_species is now added into pokemon, while pokemon_species would just be a redirect reading data from there, for compability sake.

Naramsim commented 4 months ago

We could add a v2beta! I'd love to get rid of this problem finally!

The question is: would you also change to CSV data or just the API? If we change just the API, the v2 and v2beta they can coexist. If we change the CSV data we would need to support both formats, for a small or long period, depending on how long and how well this new proposition works.

GreatNovaDragon commented 4 months ago

There's a shortterm fix, i can imagine, where we just expand the pokemon_species model entries to use the pokemon IDs and names, aka as an example we dont have an zoroark species that has both zoroark and zoroark-hisui data, but an zoroark species and an zoroark-hisui species. which i guess would be a change of csv data. it would kinda also solve #1045 and #966

i guess in this solution, the v2 could still show zoroark and zoroark-hisui under zoroark checking via the natdex number, while having v2beta show only zoroark

GreatNovaDragon commented 4 months ago

Till an hour ago, i wasnt even aware of a third model named pokemon_forms, that also handles visual only forms (even though burmy is not visual only, as it interacts with the evolution into the specific wormadam forms, so should also be in the pokemon model)

that data could also be added to the pokemon model, but most of our problems really come from the dissonance of pokemon_species with the reality.

a bit of musings for the potential v2beta

Given that pokemon_forms.csv already contains pokemon_id, you could just display a list of forms in the public facing pokemon endpoint, and append the data behind the pokemon table in the backend to that list. It would just be better imo to just change the csv of pokemon.csv to be expanded with the pokemon_forms.csv data. the pseudo sql command be like (if that helps understand what i mean)

select *
from pokemon_forms, pokemon
where pokemon_forms.pokemon_id = pokemon.id
Genschere commented 4 months ago

Here is a data model. I think that should be all tables that make use of pokemon_forms.id, pokemon.id and pokemon_species.id

species_pokemon_form_data_model

I think the main problem with merging is that the tables on the left side will get quite some duplicate entries when merging.

My solution for the evolutions I proposed here. species_pokemon_form_data_model_new

GreatNovaDragon commented 4 months ago

That's still three models for one kind of entity.

Just because vunown exists doesn't mean that should be. Heck, the Burmy and vivillon forms have individual dex entries.

By all intents of purposes, pokemon_species and pokemon should be one, and we could ask for visual only forms that really do not have any data but visual differences via the forms and pokemon link if you are THAT concerned about unown

Genschere commented 4 months ago

Yes, but it's only intent is to solve the evolution problem which was the only really bad one for my purpose (yet).

Combining makes sense but maybe not everything. Maybe take a step at a time. If pokemon_forms is the most important one, moving related tables from left to right or data entries from right to left as a first step should show where the concrete problems with certain records are and what makes sense to change and what doesn't. My solution for the evolutions is just a step in this.

Unown is easy because its form differences are visual only.

pokemon_form_types shows that there are issues with the data model. It should not need to exist. The existence of pokemon_species_prose is a mystery to me. Probably it only exists to explain certain issues in the model with how forms of certain pokemon work.

Here are my thoughts of some things that could be done (definitely not complete) and what might be the problems with them:

The big problem with Burmy is, that its forms are just visual but Wormadame has 3 forms with different types, stats and movesets. This is actually already represented correctly (3 entries in pokemon for Wormadame and 1 for Burmy) but only the evolution doesn't make sense in the current model (because based on species). What also doesn't make is that there are 3 forms for Mothim, one for every Burmy form although Mothim has only one form. I think this was a dirty fix for having an evolution form for every Burmy form and this should be corrected. My approach to evolutions can solve that too though.

Then there are pokemon where different forms are not in pokemon but the forms have different stats like Cherrim. That's because battle only forms are in pokemon_forms which kind of makes sense but kind of doesn't. If the stats are linked to the forms, there is a lot of duplicate entries (which is always a risk of data inconsistencies) for visual only forms (Unown and Pikachu have the most here) but it would be necessary to have for battle only forms. The same goes for types, movesets and abilities (if that exists). So instead move all forms with different stats, types, movesets or abilities from pokemon_forms to pokemon but not the ones where only the visuals are different. This will certainly make the data more correct. It may then even make sense to delete all 1:1 relations between pokemon and pokemon_forms making pokemon_forms far less important.

What are the downsides? pokemon with different types but the same movesets will double their entries in pokemon_moves (alread about 600k entries). If the stats are the same, it will still require duplicate entries in pokemon_stats (almost 8k entries).

Since regional forms are already in pokemon, the number of additional entries is probably small but some will create duplicates like crazy, especially Arceus and Silvally. Maybe that's not too bad though.

It would certainly make the model more consistent and actually correct some severe problems. Data duplication may be a bad thing, especially if changes happen but I think it can't be fully avoided in any way.

Regarding pokemon_species, pokemon_species_flavor_text should actually be linked to pokemon since different regional forms with different types have of course different descriptions that don't fit to the other regional forms (like for Wooper). But maybe it doesn't make sense because a certain game only shows that one? (didn't check).

Moving pokemon_species_names to pokemon creates duplicate entries again which is especially strange for pokemon that can switch between forms (like Cherrim or Arceus or mega evolutions).

Maybe some columns from pokemon_species can be moved to pokemon like evolution_chain_id and some can be dropped or changed entirely like forms_switchable (which is usually defined by the ability). So basically making pokemon_species less important too.

So many things yet it's just a part of it and I expect there to be more issues I didn't find yet.

GreatNovaDragon commented 4 months ago

I'm not currently at home, maybe I should make an diagram what I imagine the models should be later.

I see what you mean, and will try to consider that, even if I consider double lines not that much of a problem in comparison to 3 seperate models, which would be three queries if done in stuff like pokeapiNET

Genschere @.***> schrieb am So., 25. Feb. 2024, 11:04:

Yes, but it's only intent is to solve the evolution problem which was the only really bad one for my purpose (yet).

Combining makes sense but maybe not everything. Maybe take a step at a time. If pokemon_forms is the most important one, moving related tables from left to right or data entries from right to left as a first step should show where the concrete problems with certain records are and what makes sense to change and what doesn't. My solution for the evolutions is just a step in this.

Unown is easy because its form differences are visual only.

pokemon_form_types shows that there are issues with the data model. It should not need to exist. The existence of pokemon_species_prose is a mystery to me. Probably it only exists to explain certain issues in the model with how forms of certain pokemon work.

Here are my thoughts of some things that could be done (definitely not complete) and what might be the problems with them:

The big problem with Burmy is, that its forms are just visual but Wormadame has 3 forms with different types, stats and movesets. This is actually already represented correctly (3 entries in pokemon for Wormadame and 1 for Burmy) but only the evolution doesn't make sense in the current model (because based on species). What also doesn't make is that there are 3 forms for Mothim, one for every Burmy form although Mothim has only one form. I think this was a dirty fix for having an evolution form for every Burmy form and this should be corrected. My approach to evolutions can solve that too though.

Then there are pokemon where different forms are not in pokemon but the forms have different stats like Cherrim. That's because battle only forms are in pokemon_forms which kind of makes sense but kind of doesn't. If the stats are linked to the forms, there is a lot of duplicate entries (which is always a risk of data inconsistencies) for visual only forms (Unown and Pikachu have the most here) but it would be necessary to have for battle only forms. The same goes for types, movesets and abilities (if that exists). So instead move all forms with different stats, types, movesets or abilities from pokemon_forms to pokemon but not the ones where only the visuals are different. This will certainly make the data more correct. It may then even make sense to delete all 1:1 relations between pokemon and pokemon_forms making pokemon_forms far less important.

What are the downsides? pokemon with different types but the same movesets will double their entries in pokemon_moves (alread about 600k entries). If the stats are the same, it will still require duplicate entries in pokemon_stats (almost 8k entries).

Since regional forms are already in pokemon, the number of additional entries is probably small but some will create duplicates like crazy, especially Arceus and Silvally. Maybe that's not too bad though.

It would certainly make the model more consistent and actually correct some severe problems. Data duplication may be a bad thing, especially if changes happen but I think it can't be fully avoided in any way.

Regarding pokemon_species, pokemon_species_flavor_text should actually be linked to pokemon since different regional forms with different types have of course different descriptions that don't fit to the other regional forms (like for Wooper). But maybe it doesn't make sense because a certain game only shows that one? (didn't check).

Moving pokemon_species_names to pokemon creates duplicate entries again which is especially strange for pokemon that can switch between forms (like Cherrim or Arceus or mega evolutions).

Maybe some columns from pokemon_species can be moved to pokemon like evolution_chain_id and some can be dropped or changed entirely like forms_switchable (which is usually defined by the ability). So basically making pokemon_species less important too.

So many things yet it's just a part of it and I expect there to be more issues I didn't find yet.

— Reply to this email directly, view it on GitHub https://github.com/PokeAPI/pokeapi/issues/1037#issuecomment-1962879864, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADNRQQK2DZ2BGLCR75HSD2TYVMECLAVCNFSM6AAAAABC7HNBTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRSHA3TSOBWGQ . You are receiving this because you authored the thread.Message ID: @.***>

ivanlonel commented 4 months ago

I agree with @Genschere.

Getting rid of pokemon_species wouldn't be benefical. There are attributes that apply to the Species as a whole, and repeating these same values for all forms in a single table would be bad database design by going against normalization rules. That said, some of pokemon_species' current attributes should be moved to the pokemon entity, because things like evolution_chain_id, has_gender_differences, pokemon_color_id can change between variants of the same species, as I mentioned in #1026. pokemon_color_id specifically could probably be an attribute of pokemon_form, as it can change between cosmetic forms.

I used to be of the opinion that pokemon and pokemon_form entities could be merged into a single entity, but thinking in database normalization, it really makes more sense to avoid duplicating all non-cosmetic attributes for every form of Vivillon or Alcremie, for example. But Pokeapi would need some correcting to make sure only cosmetic forms have different entries for each pokemon_id. For example, Magearna should become a single pokemon with two entries in the form table pointing to its pokemon_id, while Arceus and Burmy should get multiple entries in the pokemon table, even if it duplicates other things like movesets.

All changes to the model should be thought with the tables (the CSV files) in mind. The model shouldn't change to accommodate the API needs. With a consistent and well designed model, anything that needs to be made available via the API can be easily achieved with the right SQL queries.

GreatNovaDragon commented 4 months ago

After introspection, yeah, my issue is that its different api endpoints when it should only be one.

It has been like nearly a decade since i did database normalisation, and the little bit of data duplication was not a problem in my mind.

Individual forms progressively get their own dex entries with also height and all that, even the previously visual only forms, so there is not much to remain within pokemon_species, so i dont even know what is the thing that should pokemon_species exclusively It's not that much worth it for the shape, cause thats just an int most of the time, the cry is form dependant on some, i guess the egg-data is shared between non-regional forms. But at least, regional forms should be their own species, thats totally wrong handling in the data here, just because they share the natdex number.

I'll change the title to the repo to "Change API to have only one pokemon endpoint", cause was even the endgoal of the original proposal.

Genschere commented 4 months ago

If the national dex number can replace the species id, I think it would be good enough to connect the different regional forms. Height and weight can definitely be different, Regarding the shape, Meowth's Galar form has a different shape (two legs with tail) than the other two forms (four legs). I checked the egg data and yes, they are the same between the different regional forms of a pokemon... at least the egg groups. I don't know what the deal with it is, but Articuno's Galar form has a different amount of egg cycles. I wonder why a legendary pokemon that can't be bred needs egg cycles at all.

What seems to be the same are the base friendship value ,the catch rate as well as the exp curves.

GreatNovaDragon commented 4 months ago

regional forms have the same natdex number as the base forms, while different species in my eye.

i'd like to use the name field as an id instead.

shape is a predefined thing by the pokedex.

Naramsim commented 4 months ago

Hi guys, if you like we can start working on a V3beta version! If you are volunteering we can start writing some code.

On my side, I can only help you with the deployment part, since I don't really have time to implement such complicated things now. We can start anew and use or drop Django, same goes for CSVs, we could switch to other formats. Reply if we can on you!

I'll copy this comment over at #966.

GreatNovaDragon commented 4 months ago

v3 in as a whole new format with new tech behind it may sound appealing, but isnt that a whole other discussion?

This is just for an api endpoint change.

My own knowledge lies within C#. So i cant be a big help changing already existing Python code, at most amateurishly make something new in C#

Genschere commented 4 months ago

Similar with me. I'm using C# and C++, never used Python: I can help with the CSVs but not with the API code.

GreatNovaDragon commented 4 months ago

it's fairly easy to make a database within c# within entityframeworkCore, where you have to display the model you want in code, and efCore does bother with the background database stuff like normalisation. Thats how i do the database for my own GreatNovaDragon/pkmnwl_ressources

there is FileBaseContext, that makes it possible to save the database as csv files, i dont expect though to be able to carry-over the csv files from pokeapi v2.

with Microsoft.AspNetCore.OpenApi it should also be easy to make an restful api out of those two components.

If, and only if we want to make an new v3 from the ground-up, i would propose to actually look at the entire database model as it exists, and then redo that upon knowledge what we have now.

I dont even know why the pokemon conquest data is here, when i would muse Mystery Dungeon Data would be fairly more usefull.

I can do that on my own, but i guess that would come with some resitance, like my previous opinion within this proposal.

@Genschere , woudl you like to tell me with what program you did that ERM?

Genschere commented 4 months ago

I used StarUML. It's mostly free. Not sure if it's the best tool to do that but I already had it installed and the diagram is quite basic anyway (no datatypes, no additional columns and the likes).

Resistance is maybe the wrong word, let's call it constructive feedback. ;) Making a plan first and then discussing it is certainly better than having one person just doing things.

It would also make sense to look at the other parts of the current model to see if there are possibilities or even necessities for improvements. I guess other issues already logged cover that part quite well.

GreatNovaDragon commented 4 months ago

It would also make sense to look at the other parts of the current model to see if there are possibilities or even necessities for improvements. I guess other issues already logged cover that part quite well.

Which is why i propose to look at the entire database model, and as for what tool you used. Visualizing it might make things more sense.

GreatNovaDragon commented 2 months ago

This whole thing has been sitting in the back of my brain the last few months, and i've been trying to sit down and write something in C# before just now the question poses to me

@Naramsim (i take you as the quasi project lead of pokeapi, cause i dont see anyone else actually contributing with writing permissions)

What would be the requirements be for an v3beta? You wrote that one could start anew with anything, which is a very lovely idea to my brain that would just love to start anew, but what would be the actual requirements to the API?