ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.32k stars 129 forks source link

Windows Subsystem for Linux: gs-dump-urls fails on active crawl #106

Open ivan opened 6 years ago

ivan commented 6 years ago
(gs-venv) at@windows10:~/ludios.org-2017-10-24-568d2f97$ gs-dump-urls wpull.db todo
Traceback (most recent call last):
  File "/home/at/gs-venv/bin/gs-dump-urls", line 4, in <module>
    dump_urls.main()
  File "/home/at/gs-venv/lib/python3.4/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/at/gs-venv/lib/python3.4/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/at/gs-venv/lib/python3.4/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/at/gs-venv/lib/python3.4/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/at/gs-venv/lib/python3.4/site-packages/libgrabsite/dump_urls.py", line 34, in main
    'WHERE status=?;', (status,))
sqlite3.OperationalError: disk I/O error

Probably related to https://github.com/Microsoft/BashOnWindows/issues/2395 and https://github.com/Microsoft/BashOnWindows/issues/1927

ivan commented 6 years ago

Stupid workaround: make a copy of wpull.db with cp and run gs-dump-urls on that.

In the rare case that you get an unreadably inconsistent copy, try again.