First evaluation:
Results look quite promising overall, but there are some problems here and there:
quantile_cutoff does not work, since score_norm is the better quality indicator
raw_score does not work at all to show how a document is ranked, therefore score_norm (doc) has the best results
there needs to be a cut_off by score_norm as well (very liberal = 0.7, IMO: On a first glance should be more around 0.8 to 0.9)
A minimun of returned documents (around 50) should be considered as well, because of a steep drop in score_norm values
Some policy fields seem to have worse results more often than others: entwicklung, soziales (sometimes umwelt), there should at least only show documents with a score_norm of 1
Some policy fields contain a lot of corona documents, but the walk terms (most often) seem fine and IMO it often makes sense (aeusseres, eruopa, arbeit, etc..)
Found one case in umwelt and soziales, where the walk terms could be better
See if generating Terms via Twitter Data (Seeds only or both Seeds and Walk Terms) yields good results for classifying News Data