Closed TheLethalCode closed 5 years ago
I tried this using python modules mal-scraper and beautifulsoup. I am able to get all the data I want but it's taking too much time any suggestions as to how am I supposed to solve this?
Why do you use mal-scrapper if you are anyway using beautifulsoup? Try using requests, should be a bit faster, and you can speed it up with multiprocessing.
Ohh sorry I meant I was just using mal-scraper which was using beautifulsoup
Hmm, As I said, you can try multiprocessing, but it would be better if you use requests alone. This would make the job of dockering easier, and you can have the same functionality from requests itself
Even a simple request to myanimelist.net/anime/id using the package of requests takes a significant amount of time. Iterating from id =1 to 30000 would seriously take a large amount of time.
mal-scraper itself uses requests to make an http request to myanimelist and beautifulsoup to get a html parser. I have already written a script to use the mal-scraper which gets all the data but it has the time issue, and I suppose even dockering would wouldn't be that hard while using mal-scraper.
I know it would take a lot of time, pbviously, atleast around 10 hours, we can speed this up upto 10x, by multiprocessing, and anyway, this is like a one time script. So don't worry. You work on the multiprocessing part.
Okay I'll look into multi-processing
Myanimelist.net updates its content regularly on what are all the anime released as of that moment. Scrape the information from it and make a json file that maps the anime name to its popularity and rating and description. (You can use the indexing feature myanimelist offers).