Closed Pakman450 closed 1 year ago
Merging #188 (dc04ffa) into main (9bb6258) will increase coverage by
0.02%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #188 +/- ##
==========================================
+ Coverage 91.45% 91.47% +0.02%
==========================================
Files 46 46
Lines 3661 3672 +11
==========================================
+ Hits 3348 3359 +11
Misses 313 313
Flag | Coverage Δ | |
---|---|---|
unittests | 91.47% <100.00%> (+0.02%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
datamol/scaffold/_fuzzy.py | 90.44% <100.00%> (+0.71%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
It seems like I am having trouble dealing with conditional return typing for fuzzy_scaffolding
. I am currently trying to figure out how to do this. I found a overload functionality for my issue, so I am gonna try that. IF you guys have any recommendations. That would be great.
It seems like I am having trouble dealing with conditional return typing for
fuzzy_scaffolding
. I am currently trying to figure out how to do this. I found a overload functionality for my issue, so I am gonna try that. IF you guys have any recommendations. That would be great.
@Pakman450, it's fine to break compatibility and return only the new structure here. This code has not been maintained for a while, and we should likely refactor most of it. See here too: https://github.com/datamol-io/datamol/issues/119
It seems like I am having trouble dealing with conditional return typing for
fuzzy_scaffolding
. I am currently trying to figure out how to do this. I found a overload functionality for my issue, so I am gonna try that. IF you guys have any recommendations. That would be great.@Pakman450, it's fine to break compatibility and return only the new structure here. This code has not been maintained for a while, and we should likely refactor most of it. See here too: #119
I changed the if_df
defaulted to true, so it will always bring dataframes. Should I completely remove that flag and change to return type just to bring dataframes and no dictionaries?
@Pakman450 could you rebase this PR to main
please as well? So we can make sure this PR is compatible with the latest rdkit.
Let's put the refactoring of the function itself in a different PR.
@Pakman450 can you document the column name in the dataframes ?
ping @hadim for planning refactoring later.
@maclandrol Where do you want me to document them? In the comments?
@Pakman450 could you rebase this PR to
main
please as well? So we can make sure this PR is compatible with the latest rdkit.
Sure. I give me a moment.
@Pakman450 could you rebase this PR to
main
please as well? So we can make sure this PR is compatible with the latest rdkit.Sure. I give me a moment.
rebase complete.
@maclandrol Where do you want me to document them? In the comments?
In the docstring of the function, so it's rendered in the documentation.
LGTM, thanks @Pakman450
Hello! This is for issue #114. So this edit is quite opinionated. I don't know if you guys would this kind of organized dataframe. But first I have to discuss where I added this change. I decided to add a optional flag for datamol users in the
fuzzy_scaffolding
function.The flag is termedThe flag is termedif_df
, which is defaulted toFalse
.if_df
, which is defaulted toTrue
. Two separate pandas dataframe for eachscf2infos
andscf2groups
will be returned. The rationale is not to confuse users on howfuzzy_scaffolding
function would return previously.NOTE: the output below is my best attempt to express the pandas dataframe. The output is just a df that has
3
columns and not15
rows.Let's start with
scf2infos
The way scf2infos is structured is absolutely perfect to be pandas transposed. Every scaffold output has its corresponding rdkit
mol
list and itssmarts
pattern. This means every row will represent thescf
and everyscf
will have itsmol
list.output:
What about
scf2groups
?scf2groups is trickier because the core 'groups' are in their individual dictionaries contained in a list. This can be difficult to create a df due to these multi-valued attributes. Thankfully, Pandas can control for the multi-valued attributes by calling from the
.from_dict()
method and settingorient
to'index'
. Further, this allows the df for eachscf
row to haveNone
values if there are no results. So this df will dynamically createn_core_group
s if there is an output withn
number of core groups. So in the case below,scf
index1
has two core groups but the rest do not.Other than that. I hope you like the code. I also updated the test cases.
Thanks, Steven Pak
Checklist:
news
entry.news/TEMPLATE.rst
tonews/my-feature-or-branch.rst
) and edit it.