Open Wilsbert12 opened 2 months ago
@PeterStieg @thomas-borer makes sense?
@Wilsbert12, please note that currently we are not...
@PeterStieg I know. But the outliers might be due to "junk" text (very long empty spaces, html code, repetitions etc.). I just thought if we are cleaning up the text, we might also show what we have archived.
I would like to show before / after effects of text cleaning:
Graph per category type (primary category, sub category)
Since the question of cleaning html code etc. derives from some of the analysis of the text, I would suggest the following overall structure: