DistrictDataLabs / baleen

An automated ingestion service for blogs to construct a corpus for NLP research.
MIT License
86 stars 38 forks source link

NotUniqueError caused by downloading non-changed feed content #52

Closed olgert closed 8 years ago

olgert commented 8 years ago

The idea is to store hash of feed XML, and compare new one to previously stored.