Closed fatimasadiq closed 2 years ago
Hi now im getting attached while running the crawler nothing is downloaded.
For the first problem, you were probably configuring the docker volume at the wrong directory, but you seem to have already fixed it.
For the second screenshot, the crawling is ignoring non-english pages by default. You can disable this feature by adding the following on the ache.yml file:
# Store only pages that contain english text using language detector
target_storage.english_language_detection_enabled: false
The sample config file at https://github.com/VIDA-NYU/ache/blob/master/config/sample_config/ache.yml has other configurations that my be useful.
The crawler also ignores non-HTML content by default (e.g., jpg images as seen in the log). To allow other types of content, you need to add the following config on ache.yml (including other mime-types that you need):
crawler_manager.downloader.valid_mime_types:
- text/xml
- text/html
- text/plain
- application/x-asp
- application/xhtml+xml
- application/vnd.wap.xhtml+xml
Dear Aecio,
Thank you for the response. Let me try this and I will come back to this thread. so please don't close it.
Closing this issue. Feel free to open another issue if you find other problems.
Hi I'm new to ACHe crawler and trying to run the sample to see how the crawler is collecting data then i can run myown bnut its giving me below error. I'm running on centos7 with docker.
Please help.