Gather all Pokemon Data from Main Sources

foxtrotzulu94 commented 9 years ago

Related to issue #5 #6 While the previous release of the Pokedex has certain information taken from the PokeAPI, it may be somewhat inconsistent or incomplete. Furthermore, more data on each pokemon may be required (such as Location, Moveset, Egg Group, Abilities, EV Yield).

From,

Gather and categorize all the data possible provided by such sites for the current generation of Pokemon Games and Locations. To facilitate this task, use Python 3 or 2.7 and ask @thineshth for help. It is recommended that you use JSON or XML files to save your data in a serialized manner that can later be used to introduce it into a database with ease.

Adesa23 commented 9 years ago

The moves table has the parameter "secondaryEffects" removed.

thineshth commented 9 years ago

I finished up the mapping of pokemonID to nationalID. I went through the list to get all the exceptions for those weird cases like Mew/Mewtwo.

I didn't specifically write an exception to be triggered, that can be a TODO for Amit, but if that part of the scrapper bugs out, then it will mean that a pokemon is not being accounted for.

Also had to install/use "unidecode" python library to get rid of accents on some pokemon names.

And finally as mentioned in the commit message, I changed a few things in the smogon json file so both serebii and that are consistent.

@flareflare What you can explore to do now is make sure the suffix table is up to date. Right now it has the info smogon provided. But with the fully mapped pokedexID to nationalID we can do more.

The steps would be Pull up all pokemonID in the pokemon_suffix table (See the scrapper_national_ID file for howto) then use this to also pull up all pokemonID from pokemon_nationalID that are NOT present in pokemon_suffix table. There might be a sqlite3 command for this to compare between tables.

After you filter, by whatever means, you then see which of the remaining pokemon_ID in the pokemon_nationalID table are mapped to more than one national_ID. These are what your table is missing, and you need to get their names and scrape for whatever is behind the dash.

Let me know if you want me to review the code/ answer questions.

foxtrotzulu94 commented 9 years ago

I got something pretty good for descriptions: In Bulbapedia, It seems that the row table data tag that contains the description (Pokedex entry) on a pokemon has unique attributes that can be discriminated against. The attribute is rowspan="2" class="roundy" style="vertical-align: middle; border: 1px solid #9DC1B7; padding-left:3px;" By loading up BeautifulSoup and executing >>> testy = soupy.findAll(attrs={"rowspan" : "2", "class" : "roundy", "style" :"vertical-align: middle; border: 1px solid #9DC1B7; padding-left:3px;"}) I was able to load up a list whose entries consisted solely on the descriptions. >>> testy[0].contents [' A strange seed was planted on its back at birth. The plant sprouts and grows with this Pokémon.\n'] This might make Bulbapedia attractive for scraping descriptions since you get Random Access after loading to BeautifulSoup. Iterating through the actual websites requires the Pokemon's original name though...

Adesa23 commented 9 years ago

Currently what needs to be done for pokemon class:

this.description = description; //not yet, but found on Bulbapedia or serebii

this.hatchTime = hatchTime; //don't have, possibly on bulbapedia

this.catchRate = catchRate; //don't have, available on serebii

this.genderRatioMale = genderRatioMale; //don't have, available on serebii

this.locations = locations; //don't have, don't know

this.moves = moves; //only missing affects (attack hurts party, etc)

this.eggGroups = eggGroups; //don't have, found on bulbapedia

this.evolutions = evolutions; //have, know how to (except for condition for evolution)

Adesa23 commented 9 years ago

Steps to scrape Pokemon descriptions from Bulbapedia (and possibly locations):

1) Find page with all pokemon with links (bulbapedia) 2) Load all links to pokemon into an array / Get rid of junk links 3) Load all numbers of pokemon 4) Use BS4 to iterate through this link "list" and ignore every other pokemon and number starting at first (ignore odd numbers) 5) Use Javier's code in previous comment to get description 6) Use number and description to store in DB

A similar process can be used to scrape for locations.

minicole commented 9 years ago

The following table must be added: CREATE TABLEpokemon_caught( nationalIDINTEGER NOT NULL UNIQUE, isCaughtINTEGER, PRIMARY KEY(nationalID) );

with the nationalID being the nationalIDs and the isCaught == 0 by default

I can do it at the end, I don't want to have any merge problems with the database later... This is just a reminder that it must be done :)

thineshth commented 9 years ago

@minicole There's a caught field in the pokemon table, should it be removed then. I think that is a good idea just because, the pokemon table has megas and stuff, but it makes no sense to have a caught value for them.

thineshth commented 9 years ago

So I finished description scrapping, and am currently in the middle of testing for hatch/catch/Gender ratios

I did some DB reorganizing. The most current version is on Master, so if you're working on another branch, might be a good idea to overwrite the DB with Master's before committing anything.

The main changes were some attributes in pokemon table were moved to their own tables.

Currently what needs to be done for pokemon class:

So the update: this.description = description; //done, needs cleaning up

this.hatchTime = hatchTime; //in progress

this.catchRate = catchRate; //in progress

this.genderRatioMale = genderRatioMale; //in progress

foxtrotzulu94 / ECE-Pokedex

Gather all Pokemon Data from Main Sources #4