gkunter / coquery

Coquery is a free corpus query tool for linguists, lexicographers, translators, and anybody who wishes to search and analyse a text corpus.
GNU General Public License v3.0
18 stars 4 forks source link

Installers may fail ungracefully if a source file has the wrong format #295

Open gkunter opened 6 years ago

gkunter commented 6 years ago

Currently, there are no checks that the format of a source file matches the expected format. In case of a format mismatch, there is no clearly defined behavior for the corpus installers, and in some cases, rather unhelpful SQL error messages may arise.

This can be observed for example if the COHA file lexicon.txt is picked up by the COCA installer instead of the COCA file lexicon.txt, which may happen due to the recursive directory traversal done in BaseCorpusBuilder:get_file_list(): The COHA file has five columns, but the COCA file has only four.

Probably the best solution to this issue would be to compare the data types that are read from the source files to those specified for the database tables. An "Illegal file format" exception might be raised if there is a mismatch.