Closed kvikshaug closed 5 years ago
You'd have to include the namespace to guarantee uniqueness, and the datastructure will be unordered and a bit redundant (e.g. {"chebi:CHEBI:12345": {"id": "CHEBI:12345", "namespace": "chebi"}}
), but that will indeed be a tiny bit faster.
Without removing duplicates: 2.95s Removing duplicates with json: 0.00104s + 1.55s Removing duplicates with dict: 0.000028s + 1.48s
The ideal case would be to make a set of hashable chemical class instances to avoid dupes, but that seems a bit overkill here, so I'll go with the dict solution.
The ideal case would be to make a set of hashable chemical class instances to avoid dupes, but that seems a bit overkill here, so I'll go with the dict solution.
from collections import namedtuple
Compound = namedtuple("Compound", ["id", "namespace"])
{
Compound(id="CHEBI:234324", namespace="chebi"): 1
}
maybe?
Nice suggestion :+1: then we can just use a set too, no need for a dict.
Merging #126 into devel will increase coverage by
0.11%
. The diff coverage is100%
.
@@ Coverage Diff @@
## devel #126 +/- ##
=========================================
+ Coverage 72% 72.11% +0.11%
=========================================
Files 20 20
Lines 743 746 +3
=========================================
+ Hits 535 538 +3
Misses 208 208
Impacted Files | Coverage Δ | |
---|---|---|
src/model/modeling/adapter.py | 68.35% <100%> (+0.61%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update f163a72...dc6e9ba. Read the comment docs.
I realize now that this was a valid concern you raised. This helps a bit (because there are often many duplicate ions/salts applied), however it still takes >1s to apply 77 metabolites.