Open carlgieringer opened 6 years ago
This is due to a legacy artifact in models.Article#filename:
elif ans.startswith('https://'):
# Terrible hack for backwards compatibility from when https was stored incorrectly,
# perpetuating the problem
return 'https:/' + ans[len('https://'):]
When running the scraper from scratch, there appears a directory
articles/https/
. There are some articles under this directory, and I don't think they match up with articles not under this directory in the browse view. E.g.articles/https//www.nytimes.com/
don't appear along witharticles/www.nytimes.com
.