ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.31k stars 129 forks source link

Grab-site gets only a single page #188

Open mathuryash5 opened 3 years ago

mathuryash5 commented 3 years ago

grab-site https://react.etvbharat.com/oriya/odisha --no-offsite-links --no-video --no-sitemaps

Trying the above command.

Please let me know of a fix

mathuryash5 commented 3 years ago

Also, is there a way to switch of prerequisites entirely?

TheTechRobo commented 3 years ago

Also, is there a way to switch of prerequisites entirely?

???

mathuryash5 commented 3 years ago

Sorry, I meant a way to not get the images, the css files etc. I just want to get the HTML files eventually.

ivan commented 3 years ago

I recommend deleting this line in your local copy to avoid grabbing any page requisites: https://github.com/ArchiveTeam/grab-site/blob/fe3cc6ab1465f4c8ea1f83255187085c43a95e0c/libgrabsite/main.py#L241