Closed peterjc closed 1 year ago
That's a really interesting observation. I think we're safe here though as we are not testing for the presence of missing keys, but instead incrementing specific keys (see https://github.com/biocore/biom-format/blob/2.1.14/biom/parse.py#L338). In https://github.com/PyCQA/flake8-bugbear/issues/323, I think what's driving the memory bloat is the if threshold < counts[str(x)]
check forcing the creation of a key : default value pair.
Probably OK then - yes, the memory bloat is if you try to access the missing entries (because it then adds a 0 entry with the key), made worse if you use long strings as keys as I was in my own code.
While submitting #904, I was looking at this bit of the code in
biom/parse.py
functionparse_uc
(Create a Table object from a uclust/usearch/vsearch uc file):https://github.com/biocore/biom-format/blob/2.1.14/biom/parse.py#L282
I think it would be more memory efficient to replace
defaultdict(int)
withCounter()
, see https://github.com/PyCQA/flake8-bugbear/issues/323However, that probably deserves a little benchmarking by someone familiar with the code to see if it actually matters for the memory overhead here?