comphist / cora

A web-based, token-level annotation tool for non-standard language data
http://www.linguistics.rub.de/comphist/resources/cora/
MIT License
10 stars 6 forks source link

Speed up retrieval of lemma suggestions #72

Open mbollmann opened 8 years ago

mbollmann commented 8 years ago

Originally reported by: Marcel Bollmann (Bitbucket: mbollmann, GitHub: mbollmann)


Users report that retrieving lemma suggestions can sometimes take up to 7 seconds, which is huge.

The testing script introduced in commit 1e1f888 suggests that the part which retrieves lemma suggestions from other texts within the project (with the same "ascii" token) is a major bottleneck, taking almost a full second for one lookup on the production database.

If there is no evidence for other factors, refactoring this lookup should be considered.

A possible solution could be to introduce another database table which maps (project_id, mod_ascii) pairs to tag_ids of lemma annotations. This table would have to be updated whenever lemma annotations are saved, and could then be used for efficient lookup during retrieval.