antoniogamiz / tfg

Trabajo de fin de grado de Antonio Gámiz Delgado para el doble grado de Ingeniería Informática y Matemáticas.
GNU General Public License v3.0
1 stars 0 forks source link

Scrape index page #17

Open antoniogamiz opened 2 years ago

antoniogamiz commented 2 years ago

Once we can scrape a trope page, we need to be able to scrape an index page like this: https://tvtropes.org/pmwiki/index_report.php.

Same requirements as #16.

antoniogamiz commented 2 years ago

Important, we need to take into account that not all things in this index are actually tropes. There are also media types, which it's a thing in that page that it's not a trope. Check https://tvtropes.org/pmwiki/pmwiki.php/Main/Media to see a list of media types.

We should compute that list beforehand, to avoid scraping unnecessary data (and possibly corrupting the dataset).