counterdata-network / story-processor

Story discovery engine for the Counterdata Network. Grabs relevant stories from various APIs, runs them against bespoke classifier models, post results to a central server.
Apache License 2.0
0 stars 2 forks source link

upgrade newscatcher API use to v3? #78

Open rahulbot opened 1 week ago

rahulbot commented 1 week ago

We (perhaps) need to upgrade our newscatcher integration to use v3 of their API. Are we already doing that via the use of their API library?

https://docs.newscatcherapi.com/api-docs/endpoints-1/search-newshttps://v3-api.newscatcherapi.com/docs/swagger

Or do we need to upgrade our integration to match those docs? This is a bit time-sensitive so it'd be great to move this up the priority list.

Either way I think we need to modify that fetch script to remove the parallel project fetches because the new integration has a rate limit of 1 call per second. If we do have to redo the API integration to update, maybe see if requests-ratelimiter is a useful solution if we just use one session for all the calls to newscatcher?

math4humanities commented 1 week ago

We are currently using the NewsCatcher News API V2 SDK for Python, so we are in need of an update. I'll upgrade us and remove parallelization to conform to the new rate limit.

rahulbot commented 1 week ago

Great. Architecture-wise, I'd suggest isolating it in a module by itself, and seeing if you can make the API of this new module look similar to what we use now.

math4humanities commented 1 week ago

I was able to update without making major changes to our original Newscatcher fetcher. However, when using the V3, for many projects this error is returned: Message: {"message":"in [q] \"AND\" and \"OR\" operator not allowed at same level, Please use parentheses to group terms correctly, such as(elon AND musk) OR twitter.","status_code":422,"status":"Validation error"} despite terms being grouped correctly. I am working towards resolving this error, so all projects run successfully.