MontrealCorpusTools / PolyglotDB

Language data store and linguistic query API
MIT License
36 stars 13 forks source link

Encoding baseline duration exceeds memory limit #147

Open james-tanner opened 5 years ago

james-tanner commented 5 years ago

I've been trying to use the encode_baseline measure for words inside of a SPADE script, currently:

with CorpusContext(config) as c:
    if not c.hierarchy.has_token_property('word', 'baseline'):
        print('getting baseline word duration')
        c.encode_baseline('word', 'duration')

This works fine on smaller corpora (like ICE-Can or Modern RP), but exceeds the memory limit (even on Roquefort) for corpora of SOTC-size and larger.

msonderegger commented 4 years ago

@mmcauliffe any thoughts on this? I know you probably won't have time to fix before leaving, but any guidance appreciated. like, do you suspect the issue will have been resolved with your recent memory optimizations -- or does the issue seem like an actual bug?