cthoyt / chembl-downloader

Write reproducible code for getting and processing ChEMBL
https://chembl-downloader.readthedocs.io
MIT License
65 stars 11 forks source link

Update README.md #5

Closed YojanaGadiya closed 1 year ago

YojanaGadiya commented 1 year ago

Closes #4

This PR adds total compound counts via the SQLite dump, which implicitly documents for which versions SQLite is available.

cthoyt commented 1 year ago

Not sure a note like this is super valuable, here's how I'd go about this:

  1. Write a simple SQL query that extracts one or more high-level statistics that is likely to work even if the chembl database schema changed
  2. Write a python script that iterates from latest version of chembl back to 1
  3. Run the query with chembl_downloader.query() (https://github.com/cthoyt/chembl-downloader#run-a-query-and-get-a-pandas-dataframe)
  4. Build up table using tabulate and export the GitHub format that shows for each version of chembl if the query was possible, if not, where it failed, and if so, the results
  5. Include the script in the repo
  6. Optional: have this script automatically re-write the README. Otherwise manually copy-paste the resulting table
YojanaGadiya commented 1 year ago

I see. Okay. It is going to take some time, but I will try and send a PR soon.

YojanaGadiya commented 1 year ago

@cthoyt for the 1st point with regard to statistics, do you have anything specific in mind or should I just print the total no.of chemicals and then add it to the tabulate?

cthoyt commented 1 year ago

Total number of chemicals sounds good

YojanaGadiya commented 1 year ago

@cthoyt check out the commit. It writes to the README directly.. but I am sure you would want to error to be more cleaner.. any pointers for me?

cthoyt commented 1 year ago

Please run tox -e lint,flake8 and clean up the code then I can take another look.

cthoyt commented 1 year ago

In fffd13c, I added the following functionality:

  1. Handle old archived versions of ChEBML
  2. Create a chart of all version release dates in chembl_downloader.history
cthoyt commented 1 year ago

Please don't worry about automatically re-writing the README, this is not so important

cthoyt commented 1 year ago

@YojanaGadiya all you need to do now is re-run the script

YojanaGadiya commented 1 year ago

@cthoyt I think the PR is ready to be merged. Please check if this is according to what you expected.

YojanaGadiya commented 1 year ago

@cthoyt the table is now updated. Please check if this is how you planned.

codecov-commenter commented 1 year ago

Codecov Report

Merging #5 (388a6c9) into main (ceda5e6) will decrease coverage by 1.62%. The diff coverage is 3.57%.

@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
- Coverage   22.95%   21.32%   -1.63%     
==========================================
  Files           6        7       +1     
  Lines         305      333      +28     
  Branches       65       68       +3     
==========================================
+ Hits           70       71       +1     
- Misses        232      259      +27     
  Partials        3        3              
Impacted Files Coverage Δ
src/chembl_downloader/downloader_checker.py 0.00% <0.00%> (ø)
src/chembl_downloader/queries.py 60.86% <100.00%> (+1.77%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

YojanaGadiya commented 1 year ago

Does anything need to be done with respect to coverage?

cthoyt commented 1 year ago

No don't worry about coverage but the rest are required

YojanaGadiya commented 1 year ago

Please check the latest commits. Is this similar to what you expected?

cthoyt commented 1 year ago

@YojanaGadiya better, but there's no need for a duplicate yes/no column at this point since there is either a number or a dash that says the same information.

cthoyt commented 1 year ago

If you can fix that and pass tox, this is ready to merge.

YojanaGadiya commented 1 year ago

@cthoyt I fixed the README and flake8 on my end. Please can you check/approve the workflow to confirm. Thank you.

YojanaGadiya commented 1 year ago

Apologies for spamming. Hopefully the last commit to the PR. Please approve @cthoyt

cthoyt commented 1 year ago

Note that before merging, I deduplicated a lot of code. And combine the new table with the existing one. Thanks @YojanaGadiya.