Closed hellsienna closed 3 months ago
@hellsienna there was a new base URL requirement introduced by commoncrawl recently. I've updated the code to conform with the requirement and things should be back to normal now. See details at PR #2. https://commoncrawl.org/get-started
To access data from outside the Amazon cloud, via HTTP(S), the new URL prefix https://data.commoncrawl.org/ – must be used.
I'm no longer actively maintaining this code base but feel free to raise new issues and I'll try my best to help out when possible.
Cheers!
Thanks for the quick response man.
Do you want to crawl index CC-MAIN-2024-30? yes Will start crawling CC-MAIN-2024-30 now... thread 'main' panicked at src\lib.rs:237:5: assertion
left == right
failed left: 1 right: 5 note: run withRUST_BACKTRACE=1
environment variable to display a backtraceI don't know if this project is still supported but I am very interested in it. anyway I get the above error when i try to run it. Any suggestions on how I can solve this?