Open schliflo opened 4 years ago
From an API perspective providing a language filter isn't that big thing, but I think it is harder to determine the language by the headline when you cannot be sure, that the whole feed offers only one language (which would be very easy to just add a language field in the database).
I'll check if the feeds contain mixed languages and if yes it would make sense to discuss further if we want to spent some time checking for automated language detection features or if we are going to only use single language feeds in the future.
Maybe this lib is an easy solution for now: https://pypi.org/project/langdetect/
just took a short look but seems promising to me. Dunno if it's worth investigating if we plan to fully overhaul the current backend implementation though ...?!
@johanneshiry maybe it's feasable to port the language detection logic used in https://github.com/coverified/platform_crawler - we basically only need to filter out all non german entries
any reason why we don't do a full switch to https://github.com/coverified/platform_crawler? Maybe this would make more sense? However, I could also provide small fix here. What's your preferred solution?
We currently serve all feed entries to users regardless of the article language. This leads to situations where users get served "mixed" content:
This could be solved by using some kind of language detection. Ideally the API would provide a language filter argument or language specific endpoints.