Closed rawmean closed 6 months ago
Hei @rawmean, we will add it in the to-do list for feature requests! It would be interesting to create a new graph for this and maybe calling it CrawlerGraph
or DeepScraperGraph
I'll try to take a stab at it. This is what I'm thinking: Input: URL
Let me know if this looks reasonable or if you have any other plan/better alternative that you can think of
Yeah, pls contact me thorough email (mvincig11@gmail.com)
Sounds really intresting.
I am looking for the feature too. There are two use cases: 1.Loop through several path levels of a website, to extract information from all item pages. like to extract all shop item informations, all renting houses prices and locations. In this case, I can specify which paths will be processed by regex expressions. 2.Loop through all pages of a small website. It behaves like crawler as nutch, while I can specify what I will get from each page. There is a prompt to match the target page, and a prompt to get data/files from that page. Sometimes I need to crawl all videos/images of a specified condition for the website.
Is your feature request related to a problem? Please describe. I'd like to scrape a website n-levels deep.
Describe the solution you'd like For example, given url = example.com, the scraper should also follow the links in example.com and scrape those too
Describe alternatives you've considered I can use BeautifulSoup and download the pages and then feed them to this