TheLethalCode / Artemis-arrow

14 stars 31 forks source link

Generate a list of all anime #10

Closed TheLethalCode closed 5 years ago

TheLethalCode commented 5 years ago

Myanimelist.net updates its content regularly on what are all the anime released as of that moment. Scrape the information from it and make a json file that maps the anime name to its popularity and rating and description. (You can use the indexing feature myanimelist offers).

prashantramnani commented 5 years ago

I tried this using python modules mal-scraper and beautifulsoup. I am able to get all the data I want but it's taking too much time any suggestions as to how am I supposed to solve this?

TheLethalCode commented 5 years ago

Why do you use mal-scrapper if you are anyway using beautifulsoup? Try using requests, should be a bit faster, and you can speed it up with multiprocessing.

prashantramnani commented 5 years ago

Ohh sorry I meant I was just using mal-scraper which was using beautifulsoup

TheLethalCode commented 5 years ago

Hmm, As I said, you can try multiprocessing, but it would be better if you use requests alone. This would make the job of dockering easier, and you can have the same functionality from requests itself

prashantramnani commented 5 years ago

Even a simple request to myanimelist.net/anime/id using the package of requests takes a significant amount of time. Iterating from id =1 to 30000 would seriously take a large amount of time.

prashantramnani commented 5 years ago

mal-scraper itself uses requests to make an http request to myanimelist and beautifulsoup to get a html parser. I have already written a script to use the mal-scraper which gets all the data but it has the time issue, and I suppose even dockering would wouldn't be that hard while using mal-scraper.

TheLethalCode commented 5 years ago

I know it would take a lot of time, pbviously, atleast around 10 hours, we can speed this up upto 10x, by multiprocessing, and anyway, this is like a one time script. So don't worry. You work on the multiprocessing part.

prashantramnani commented 5 years ago

Okay I'll look into multi-processing