Scraper to get data from Bulbapedia and convert to a graph database.
The script generated in this project is in the project pokemon-graph.
This scraper get the data from the pages:
The scraper is a console application in C# and .Net Core.
To run the project it's necessary two sections in the appsetting.json: bulbapediaConfiguration and fileExportConfiguration.
This configuration has the bulbapedia urls and paths, necessary to read the data. It's mapped in the class BulbapediaConfiguration, inside the configurations. The properties in the configurations are:
This configuration has the property fileFullPath, that is used the informe the file path and file name to the script generated. It's mapped in the class FileExportConfiguration, inside the configurations.
{
"bulbapediaConfiguration": {
"baseUrl": "https://bulbapedia.bulbagarden.net/w/index.php?title=",
"baseImageUrl": "https://",
"pokemonListPath": "List_of_Pok%C3%A9mon_by_National_Pok%C3%A9dex_number",
"evolutionListPath": "List_of_Pok%C3%A9mon_by_evolution_family",
"megaEvolutionListPath": "Mega_Evolution",
"formsListPath": "List_of_Pok%C3%A9mon_with_form_differences"
},
"fileExportConfiguration": {
"fileFullPath": "C:\\temp\\pokemon.cypher"
}
}
The project has three main folders, that sepate the context from the project: Configurations, Models and Services.
This folder has the map from the configurations, utilized to read the configurations from the file exporation and bulbapedia urls, as explained in the last section.
This folder contains the models from the domain, it's mapped all the data from the database. The main class is Pokemon, inside of it has all lists of evolutions, mega evolutions, types, forms. Inside this folder has a subfolder named Comparers, inside of it has the TypeEqualityComparer utilized in the project to compare the types.
This folder is contains the logic from the project, separeted in three contexts: FileExport, Scrapers and ScriptGenerator.
This service is responsible for exporting the script to a file, in the place configured.
This service is responsible for reading the data from the bulbapedia and convert it to the model objects. It has one scraper for each path in the configuration and each scraper has a specific logic for the page, because the lists share the same layout but has different structures. The scrapers are:
The PokemonList scraper needs to be runned first, it's the one that create the pokemon list, utilized by the others scrapers.
This service is responsible for converting the pokémon list to a cypher script, it passes by all pokémon properties and create the nodes and relationships from the script.