elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.
https://hexdocs.pm/crawly
Apache License 2.0
988 stars 116 forks source link

Q: Can the spider "fan out" on a website? (multiple next items) #263

Closed larshei closed 1 year ago

larshei commented 1 year ago

I was just getting my hands dirty with Crawly (nice work btw) and I was trying to crawl a website that has the following structure:

                 Main Page 
         /           |        \
ProductA         ProductB       ProductC
 |      \        /      \       /     
PartA1 PartA2 PartB1 PartB2 PartC1 ...

Where ProductX are the main products and PartXn are replacement or update parts.

I would like to get a list of all Products with their respective parts.

Now I wonder: Can I start on Main Page and from there open every ProductX page or is there only just one next item (in which case I could probably add :start_links for each ProductX)?

larshei commented 1 year ago

Nevermind I just realized the variable next_requests is populated with a list 🙈