Story discovery engine for the Counterdata Network. Grabs relevant stories from various APIs, runs them against bespoke classifier models, post results to a central server.
Our logging levels aren't being applied right. In the processor.__init__.py we set default level to INFO, and there's a lot of code trying to silence things, but it seems a bit haphazard. We should revisit this and clean it up.
For instance, we get DEBUG out of mcmedata:
14:13:35.389 | DEBUG | mcmetadata.languages - Language mismatch - indicated en but guessed pt
And there is lots of noisy stuff out of trafilatura:
14:13:11.834 | WARNING | trafilatura.metadata - error in sitename extraction: string index out of range 2023-10-16 14:13:11 [trafilatura.metadata] WARNING: error in sitename extraction: string index out of range 14:13:11.949 | WARNING | trafilatura.metadata - error in sitename extraction: string index out of range
Our logging levels aren't being applied right. In the
processor.__init__.py
we set default level toINFO
, and there's a lot of code trying to silence things, but it seems a bit haphazard. We should revisit this and clean it up.For instance, we get
DEBUG
out of mcmedata:14:13:35.389 | DEBUG | mcmetadata.languages - Language mismatch - indicated en but guessed pt
And there is lots of noisy stuff out of trafilatura:
14:13:11.834 | WARNING | trafilatura.metadata - error in sitename extraction: string index out of range 2023-10-16 14:13:11 [trafilatura.metadata] WARNING: error in sitename extraction: string index out of range 14:13:11.949 | WARNING | trafilatura.metadata - error in sitename extraction: string index out of range