WheatonCS / Lexos

Python/Flask-based website for text analysis workflow. Previous (stable) release is live at:
http://lexos.wheatoncollege.edu
MIT License
118 stars 20 forks source link

Browser hangs in Tokenize #228

Closed scottkleinman closed 9 years ago

scottkleinman commented 9 years ago

I uploaded the four Middle English texts (so not a huge amount of data), and it took quite a long time to render the table. When I tried doing Raw Counts instead of Proportional Counts, the browser hung and I had to kill the process. Is anyone else encountering such problems?

czhang03 commented 9 years ago

What about Chinese and old English? Just to see whether that is caused by encoding

scottkleinman commented 9 years ago

Old English worked fine (I did set the special characters rule set it to DOE SGML). I tried the Modern Chinese collection, and it worked the first time set to 1-grams by characters with proportional counts. I think I tried 1-grams by words next, and my browser hung. I next tried 1-grams by characters with raw counts. The same thing appeared to happen--I got the dreaded "Not responding" from Firefox. But I saw in the Windows Task Manager that the process was continuing to take up RAM, so I left it, and after a few minutes the table did render. So, at least for Chinese, encoding is not the issue--more likely the problem is the size of the corpus (the table had 174 pages). We'll have to see if there is any more optimisation that can be done to speed up the process. As for Middle English, I'll do some more testing. The main issue seems to be that from the user's point of view, the browser appears to hang (the animation in the blue spinner icon stops as well).

scottkleinman commented 9 years ago

One other weird thing is that in the raw counts, characters like 东 can occur "144.0" times. I'm not sure what the decimal point is for since we're never going find 东 occurring 144.5 times! :-)

czhang03 commented 9 years ago

I don't think floating point are that serious though. We have lot of known issues that we need to trap.

I have been able to recreate browser hang. The back-end only takes 1 second to reply to the post, and the browser receive the reply for half a second. All the rest are spent on applying js and css, I think we can solve this by only show 25 rows in the first page.

@akuisara

scottkleinman commented 9 years ago

Did we apply the Scroller extension in this table? The 50,000 row client-side example might help and the 500,000 row server-side example would be even better. This is assuming that the js and css Moses detects relates to generating the table.

akuisara commented 9 years ago

We did not apply the Scoller. I have tested the DataTable before, the results turned to be the same with and without Scoller being implemented. Besides that, client-side and server-side processing took the same time as well.

scottkleinman commented 9 years ago

OK, let's hope that the pagination solves this problem. I have made the fix, and it will go out with my next push.