N0taN3rd / Squidwarc

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
https://n0tan3rd.github.io/Squidwarc/
Apache License 2.0
168 stars 26 forks source link

Installation fails with missing node-warc submodule #44

Closed peterk closed 5 years ago

peterk commented 5 years ago

Are you submitting a bug report or a feature request?

Bug report.

What is the current behavior?

Installing as per the instructions gives the following error in the bootstrap.sh script (relating to the node-war submodule):

Submodule 'node-warc' (https://github.com/N0taN3rd/node-warc.git) registered for path 'node-warc'
Cloning into '/Squidwarc/node-warc'...
error: Server does not allow request for unadvertised object 0de56e6628d1e0e8d18cb9e772ae7871bd8cd926
Fetched in submodule path 'node-warc', but it did not contain 0de56e6628d1e0e8d18cb9e772ae7871bd8cd926. Direct fetching of that commit failed.

What is the expected behavior?

Installation proceeds normally.

machawk1 commented 5 years ago

I can replicate this but Squidwarc still seems to be installed. For some reason, even updating the node-warc dependencies in a variety of places in the project to 0.3.2, 0de56e6628d1e0e8d18cb9e772ae7871bd8cd926 is still used. I am not sure if rebasing the submodule onto the node-warc latest master would remedy the issue but it would be worth a try.

EDIT: I am using the latest master. @N0taN3rd looks to have done some work on the next branch recently, so that will require further testing.

machawk1 commented 5 years ago

The next branch appears to pull node-warc 48cd9aad8d41e04e879377e83a4265d77c0cb06d. The edit on this branch do not report the error you reported @peterk and appears to complete the default crawl successfully. @N0taN3rd Would it be worth it to merge the next branch into master to facilitate the fix on tools that depend on Squidwarc?

N0taN3rd commented 5 years ago

@peterk apologies for the delay in closing this issue officially but this issue should be fixed