Closed lorensr closed 3 years ago
Posting $50 bounty: https://www.bountysource.com/issues/97706372-crawler-isn-t-following-links
Hi @lorensr,
The start_urls
are more of "a pattern of URLs the crawler should accept" than "which URL should I start with", which is why other pages are not crawled.
As the contents
route doesn't have children, nothing else is found, but if you try with "start_urls": ["https://graphql.guide/vue"]
, it will work.
One way to solve this issue could be to create a sitemap.xml
only for the crawler, so it can follow all the pages inside (doc)
Or use a more generic "start_urls": ["https://graphql.guide/"]
for example
Thank you so much! Generic solution worked great ☺️
No worries, feel free to close the bounty or give it to a charity of your choice :D
No worries, feel free to close the bounty or give it to a charity of your choice :D
Hey @shortcuts please take a look at my problem here #571
It is not generating hits for child routes example /docs
And when I enter the complete URL with the /docs
route then it shows ignored start URL
I'm using the docker container and this config:
https://github.com/GraphQLGuide/book/blob/411bb46629b622a785312f199053e5c55234608d/docsearch.json
When I run the docker command, I get 303 "nb hits", but they all point to different anchors on the
start_url
page—none of them are for the other pages linked on thestart_url
page (https://graphql.guide/contents)