flairNLP / fundus

A very simple news crawler with a funny name
MIT License
126 stars 63 forks source link

add-publisher-wdr #439

Closed jannispoltier closed 3 weeks ago

jannispoltier commented 1 month ago

Hi, I added the WDR (Westdeutscher Rundfunk) to the collection of German publishers and ran all tests and commands as instructed. There is just one issue I couldn't resolve as for some article summaries the crawler returns a <p class="stand small"> element although I added an XPath to return <p class="einleitung small"> and this element did exist in the given file. However, I couldn't come up with a fix for that problem. Maybe you have a suggestion :)

MaxDall commented 1 month ago

Hey, @jannispoltier could you rerun python -m scripts.generate_parser_test_files -p WDR -oj and push the modified JSON file? You don't have to remove the old test case, just run the script with -oj flag :)

jannispoltier commented 1 month ago

Hey, @jannispoltier could you rerun python -m scripts.generate_parser_test_files -p WDR -oj and push the modified JSON file? You don't have to remove the old test case, just run the script with -oj flag :)

Hi @MaxDall, I pushed the file and on my local machine the tests were now successful ☺️