Closed salinausd305 closed 3 days ago
Try using URLStatusCrawlerEventListener instead.
Since you only want the URLs and do not care about site content, pair this with a IDocumentFilter. Simplest might be to use SegmentCountURLFilter with count
is set to 0 or 1 and onMatch
set to exclude
.
(not tested)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I'm trying to use the web crawler to get a list of URLs for our websites. We are moving to a new platform and I'm hoping to get a list of URLs for our redirects.
I have the web crawler running, I used the Config Starter page and tested, but I'm not sure how to get the data to a CSV file.
I looked at the CSVFileCommitter documentation, but I'm still not sure how to make it work.
The only data I want is a single column list of URLs in a CSV file that I can review.
Is there a way to set it up in the config file?