flairNLP / fundus

A very simple news crawler with a funny name
MIT License
96 stars 59 forks source link

Require at least one paragraph to evaluate `ArticleBody` to true #500

Closed MaxDall closed 2 weeks ago

MaxDall commented 2 weeks ago

This PR requires ArticleSection and ArticleBody respectively to consist of at least one paragraph to be evaluated as True. This is to increase the data quality of retrieved articles.

I had to replace an old WAZ test case which consisted of only a summary.