Closed clovis closed 9 years ago
The problem actually ended up being in how we truncate the right-side concordance. I commented out the truncating regex on line 138: https://github.com/ARTFL-Project/PhiloLogic4/blob/master/www/reports/collocation.py#L138
This does seem to reduce the number of collocates for any given hit, but by maybe 1 or 2%, which I think is fine given that we'll be completely revamping this code soon.
Fixed in new collocation code
The ECCO database has many in-word tags. This breaks tokenization in collocations.
We could fix this by removing the element that causes the issue. In the case of ECCO, it's ```
Richard, how do you feel about using tweaking the current tokenizing regex we use for collocations to detec in-word tags?