biolab / text-semantics

The package with scripts for semantic analyser project
MIT License
4 stars 5 forks source link

Fixes to scraper for CONTRIBUTIONS TO CONTEMPORARY HISTORY #34

Closed PrimozGodec closed 3 years ago

PrimozGodec commented 3 years ago

Issue

Due to different styles in HTML format, some articles didn't have text scrapped and some had only a text

Fix

Fix scraper to scrape text of all articles. After some random inspection, it seems articles are now scrapped correctly.