CAIDA / commoncrawl-host-ip-mapper

Crawler that retrieves commoncrawl's crawled hosts and their corresponding IPs
Other
16 stars 1 forks source link

error running the code #1

Closed hellsienna closed 3 months ago

hellsienna commented 3 months ago

Do you want to crawl index CC-MAIN-2024-30? yes Will start crawling CC-MAIN-2024-30 now... thread 'main' panicked at src\lib.rs:237:5: assertion left == right failed left: 1 right: 5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

I don't know if this project is still supported but I am very interested in it. anyway I get the above error when i try to run it. Any suggestions on how I can solve this?

digizeph commented 3 months ago

@hellsienna there was a new base URL requirement introduced by commoncrawl recently. I've updated the code to conform with the requirement and things should be back to normal now. See details at PR #2. https://commoncrawl.org/get-started

To access data from outside the Amazon cloud, via HTTP(S), the new URL prefix https://data.commoncrawl.org/ – must be used.

I'm no longer actively maintaining this code base but feel free to raise new issues and I'll try my best to help out when possible.

Cheers!

hellsienna commented 3 months ago

Thanks for the quick response man.