The basic idea here is to use gzipped json for disk-efficient storage of simply structured data. In particular this should come in handy in shrinking language data in spaCy.
As of this commit this works, and there's a basic test. A couple of things to improve:
code seems redundant, clean it up
more tests
maybe add support to other functions, like the jsonl ones
The basic idea here is to use gzipped json for disk-efficient storage of simply structured data. In particular this should come in handy in shrinking language data in spaCy.
As of this commit this works, and there's a basic test. A couple of things to improve: