TalkAboutLocal / local-news-engine

GNU Affero General Public License v3.0
14 stars 2 forks source link

CNJ and IT scrapers don't download all pages #68

Closed Bjwebb closed 7 years ago

Bjwebb commented 7 years ago

Looking through the wget logs, we get a lot of instances of pathconf: Not a directory.

Bjwebb commented 7 years ago

Looks like the problem is that wget can't save a file a/b if the file a already exists (and isn't a directory).

We might be able to get around this be changing the name those files are saved with. One way of doing this would be to use --adjust-extension. https://lists.gnu.org/archive/html/bug-wget/2015-06/msg00021.html