Closed dhildreth closed 6 years ago
It will be investigated, but is it possible to share a copy of your config? In the meantime, have you tried defining your start URLs using <urlsFile>...</urlsFile>
instead of <url>...</url>
. It allows you to pass the path to a file that contains one URL per line the way you want to do it. That way you won't need to use the regex link extractor.
Found and fixed the issue. I was able to reproduce when the URL file had blank lines in it. The latest snapshot now has this fix.
Please confirm.
Thank you so much! Works for me. 👍
I will use the urlsFile option. It's amazing what tools this crawler offers. Just when I think I understand most of the features, I learn something new!
I think I've stumbled upon a bug here. I'm attempting to use a .txt file as a sitemap of sorts. The file has one URL per line. It looks something like this:
Anyways, I'm using the RegexLinkExtractor like this:
When running the crawler, I get this error for each of the URLs in the sitemap.txt file:
Any suggestions would be greatly appreciated.