DARIAH-DE / TopicsExplorer

Explore your own text collection with a topic model – without prior knowledge.
https://dariah-de.github.io/TopicsExplorer
Apache License 2.0
62 stars 10 forks source link

encoding error not visible in TopicsExplorer #57

Closed MHuberFaust closed 5 years ago

MHuberFaust commented 6 years ago

encoding Errors produced by read_files() are not visible in TopicsExplorer.

severinsimmler commented 6 years ago

How exactly did that turn out? And what do you mean by read_files()? Because it is a web application, the hard disk cannot be accessed directly for security reasons. The user has to select the files, which are then fetched by Flask in binary format and decoded here with UTF-8:

https://github.com/DARIAH-DE/TopicsExplorer/blob/cdefcdc64ef8fafeded59d0c975f03f71a06a14c/application/modeling.py#L70

MHuberFaust commented 6 years ago

it stopped at "reading files" (or sth like that), flower still turning, just no progress. I tried to circumvent the problem by using the notebook... that's why I figured it had something to do with read_files(). So it probably needs an exception to test on correct encoding

severinsimmler commented 6 years ago

Do you have any details on the files? In v0.1.6, the file name currently being processed is displayed on the GUI. Since the text files are sent to localhost, this can take a while for large files. The entire workflow (including reading files) is in a large try block. This means that if an error should occur, this will always be indicated. If it just seems to freeze, everything still works correctly.

If you have the data (or the contact details of the person affected), I would try to reproduce it with the developer version and can probably tell you more.