Closed Pakman450 closed 1 year ago
Merging #178 (4f96abc) into main (b8531fa) will increase coverage by
0.20%
. The diff coverage is98.93%
.
@@ Coverage Diff @@
## main #178 +/- ##
==========================================
+ Coverage 91.13% 91.33% +0.20%
==========================================
Files 47 50 +3
Lines 3529 3623 +94
==========================================
+ Hits 3216 3309 +93
- Misses 313 314 +1
Flag | Coverage Δ | |
---|---|---|
unittests | 91.33% <98.93%> (+0.20%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
datamol/descriptors/descriptors.py | 90.90% <ø> (ø) |
|
datamol/filter/matches.py | 98.21% <98.21%> (ø) |
|
datamol/__init__.py | 100.00% <100.00%> (ø) |
|
datamol/descriptors/__init__.py | 100.00% <100.00%> (ø) |
|
datamol/descriptors/compute.py | 100.00% <100.00%> (ø) |
|
datamol/filter/__init__.py | 100.00% <100.00%> (ø) |
|
datamol/filter/filter.py | 100.00% <100.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@hadim, apologies, I should have brought this up in the gh issues for this repo according to your changes in the PULL_REQUEST_TEMPLATE.md
. Do you want me to close this PR and submit an issue about this feature?
Hello @Pakman450 and thank you for spending time on this! Unfortunately, we have already planned to release such filtering features in the future. We don't know yet whether it will be in datamol
or in a separate package under the datamol organization.
I apologize that we will likely not be able to review and merge that PR. It wasn't explicitly stated, but in general, it's a good practice to open an issue before investing time into a PR. I have now modified the PR template to add a comment related to that: https://github.com/datamol-io/datamol/commit/d4f5b4c45351a1d25fcd4fa2d8e22abecb932668
Thanks again for your interest in datamol and I hope we'll see you again soon!
@hadim, apologies, I should have brought this up in the gh issues for this repo according to your changes in the
PULL_REQUEST_TEMPLATE.md
. Do you want me to close this PR and submit an issue about this feature?
It looks like we were writing at the same time :-) Thank you for your understanding!
@hadim I understand. I am really sorry about that. I will do that next time!
Hello,
I was shocked to learn that Datamol doesn't have filtering feature to find molecules that have PAINS hits (unless I am REALLY blind and I can't find it. If you do please direct me). So I added a filtering mechanism to discover any hits in a list of SMILES. The user can additionally specify what kind of RDKit filtering catalogs to use, which is what the
catalog_specifiers
is in the new code. Below is available RDKit catalogs also shown in this doc page:Now,
ALL
is all of them in that list. So to make sure there are no duplicates in a user-definedcatalog_specifiers
, I made sure to uniquify a that list inset_filter_params
. Further, the reason whyset_filter_params
is not in__init__.py
is because I didn't want the user to use that fn to simplify data processing, as per the philosophy of datamol. If users were to specify the catalogs, it will automatically do it for you.All simple fns are in
matches.py
. Thefilter.py
has the fns that uses various fns inmatches.py
to either return a python dictionary of the results, or a pandas dataframe. Also, I added number of filter catalog hits in thecompute.py
because I think the user would want to know how many catalog hits there are for each of their molecule, which can be debated let me know what your thoughts are.Thanks, Steven Pak
Checklist:
news
entry.news/TEMPLATE.rst
tonews/my-feature-or-branch.rst
) and edit it.