Closed zhu0619 closed 9 months ago
Attention: 1 lines
in your changes are missing coverage. Please review.
Comparison is base (
9e94d02
) 91.96% compared to head (e812492
) 91.93%.
Files | Patch % | Lines |
---|---|---|
datamol/isomers/_enumerate.py | 90.90% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks Lu.
It looks good to me after fixing the docstring.
Question: my understanding is that
GetStereoisomerCount
will actually do that exact same asenumerate_stereoisomers
withn_variants=<MAX>
and simply calllen()
on the output. Am I correct here? Maybe check what the rdkit code is doing under the hood. Not really a big deal for me here but I just wanted to flag it in case you thinkcount()
should instead reuseenumerate()
.
[GetStereoisomerCount](https://github.com/rdkit/rdkit/blob/2a68050ed07a3b27cabf33d535f0c46117135209/rdkit/Chem/EnumerateStereoisomers.py#L136C24-L136C24)
computes an estimated number based on the stereo bonds. So in some cases, the counts from GetStereoisomerCount
is larger than the enumerations.
Initially, I was using the output of enumerate_stereoisomers
. But the computational time is too long especially for large dataset even with parallelization.
I will also add an option to count the isomer using enumerate_stereoisomers
if the user needs more accurate counts.
ok, so it seems like GetStereoisomerCount
is doing a slightly different things and also seems faster. All good then, thank you Lu!
Changelogs
datamol.isomers._enumerate.count_stereoisomers
Added unit tests to count stereoisomers for only undefined and all possible stereoisomers.
The step
Chem.FindPotentialStereoBonds(mol, cleanIt=clean_it)
, the information on bond is cleared ifcleanit=True
. Therefore,cleanit
should be disabled when performing enumeration or counting only on undefined stereochemistry when the molecules have defined stereo information on bonds.See example below:
Reproduce the error