NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
204 stars 41 forks source link

Validate TSV training input #281

Closed annakasprzik closed 5 years ago

annakasprzik commented 5 years ago

Maybe Annif could check the input tsv files for trivial flaws such as excessive newlines not containing any tab character and give out a warning. Right now you receive an error from some Python method instead which might not be very transparent for users that are not proficient in Python (and even for those who are).

osma commented 5 years ago

Indeed, the TSV parser should be more robust.

See related discussion on annif-users. Here the issue was an empty line which caused an ugly error.