Closed cadeduckworth closed 1 year ago
Merging #243 (4af6de6) into develop (3b93aad) will increase coverage by
0.38%
. The diff coverage is99.02%
.
@@ Coverage Diff @@
## develop #243 +/- ##
===========================================
+ Coverage 80.73% 81.12% +0.38%
===========================================
Files 15 15
Lines 1905 1960 +55
Branches 294 296 +2
===========================================
+ Hits 1538 1590 +52
- Misses 276 278 +2
- Partials 91 92 +1
Impacted Files | Coverage Ξ | |
---|---|---|
mdpow/workflows/dihedrals.py | 95.93% <99.01%> (+1.05%) |
:arrow_up: |
mdpow/workflows/base.py | 77.55% <100.00%> (-4.09%) |
:arrow_down: |
:mega: Weβre building smart automated test selection to slash your CI/CD build times. Learn more
@orbeckst I think I fixed the bond highlighting issue and image size issue. I incorporated the new algorithm/methods with the 'low res' plotting functionality so that there is a functioning base to improve upon.
Currently going back and looking at our previous conversations and the new links you sent to improve upon the existing methods..
That looks better. Very good. Did you check other examples, too?
The other comments were along the lines
prettyfy()
I think)I checked the plots for that molecule that I have been testing on, but no others yet.
I changed a lot of things that will be helpful moving forward, especially for testing.
The information now used for Mol objects almost all come directly from atom_indices
, which uses the RDKit conversion method that is tested and verified from PR#217.
For each dihedral group in a molecule the atom_indices
and bond_indices
are saved in a dictionary, where the key is in the format of 'C1-C2-C3-C4'
to correspond to MDAnalysis selection style, so that the results DataFrame can be the source that calls the associated plotting graphics info. That way, the DF determines which atom and bond indices are highlighted, rather than an iterative method.
EDIT (reminder): @orbeckst - relevant to discussion from our meeting on 6/29/23.. I did not remember because it was several months ago, but there was a method to the madness all along!
@orbeckst
prettyfy()
, will add changesatom_indices
added to label Mol object on plots
To-Do:
cleaner plot titles
This is where I have landed with the combined RDKIT/matplotlib to SVG composition. (SVG converted to PDF attached below)
With the number of atoms in the SAMPL9 molecules, and the small text size of the atom indices, the SVG plots will look much better for those purposes. It should also maintain a consistent box size for all of the molecules, so the orientation or overall size of the molecule should not cause the image to zoom in or out like the MolToImage
RDKit PDF output method.
Regarding the PDF output method, I figured out that, because the default size of the MolToImage
conversion is 300x300, 'square' shaped molecules end up appearing bigger under increased extent
values, which means the 'scale' cannot really afford to be 'zoomed in' for horizontally oriented molecules that assume more of a 'linear' conformation. There exists a noticeable difference between the aspect options, auto
and equal
, but changing those without changing the frame size and other parameters causes distortion. All of this of course can be changed, but would be difficult to detect how to do so for each case. For the current methods that output PDF, the best combination I have come across is the most recently pushed commit (see also, plot directly beneath it), and likely suits most sizes/shapes (needs further testing to be sure).
Regarding the SVG output method, the svgutil
code, for me, has been tricky for size, scale, and placement. There is a way to get the sizes after the original Mol or MPL objects are saved to SVG files which could make things easier, but it seems like minor changes in different steps of processing can cause big downstream changes for placement/proportions/scales. There is also a method that converts units between px, cm, pt, etc., but I have not had a chance to get into it yet.
It might be best to use what I have developed for SVGs as an option that can be specified, for now, and use the pdf method as default, unless conda has something similar to inkscape that we can include as a dependency in order to facilitate viewing, conversion, and notebook embedding?
The PDF looks sharp and can be zoomed in β
The next thing is to make sure that we don't need to scale it to out it into a paper. To make it publication-ready we would want it to be "at final size", i.e., typical two-column wide figure (about 190 mm, just randomly used https://writing.stackexchange.com/questions/21658/what-is-the-image-size-in-scientific-paper-if-indicated-as-a-single-1-5-or-2-c ) and font size should not be smaller than 9 pt. I think the matplotlib default is 10 so if you make figures at final size, everything should look good.
@orbeckst
Here are some plotting results after implementing existing method with svgutils
and new method with cairosvg
into dihedrals.py
module.
svgutils
is being used to create the SVG files/graphics and layer them.
cairosvg
is being used to convert to pdf and scale/save in the desired size.
I added a kwarg to adjust the final exported pdf plot width, with the default set to 190mm.
I also added conversion to pixels for compatibility with cairosvg.svg2pdf
.
Example with a fairly cluttered molecule from SAMPL9 set (step=1000), before I added a little white space to the right border to allow for longer solvent names: S07_C21-C22-C25-F27_violins.pdf
SM25 & SM26 (testing resources) after increasing whitespace on right border: SM25_C2-N3-S4-C5_violins.pdf SM26_O1-S2-C3-C4_violins.pdf
I have not been able to figure out how to suppress matplotlib output, which does not contain the Mol image.
Moving forward I am working on fixing the tests to accommodate what was changed during development. After I achieve full coverage again I will request a review and start making changes to the documentation to reflect changes and new functionality.
EDIT: see parentheses
@orbeckst
The codecov report of 98.80% is misleading; some of my current tests are weak at best, and some are possibly not testing anything at all. Once the new SVG methods are finalized, more specific tests need to be written, existing tests need to be revisited, and fixture scopes need to be optimized.
Logger/error info needs to be added or updated. Once PR229 is merged the test_workflows_base.py
module needs to test iterative usage of new plotting functionality for multiple projects. Docs will need a thorough rewrite. Tutorials and examples will need to be revisited.
The 'plotting only one solvent issue' needs to be revisited (see most recent comments, SM25 & SM26 pdfs). The current method almost works, but results in an unnecessary solvent label in the legend. I have already already come up with a way to probably fix this but have not tested it yet.
The plots look nice, they seem scalable (a few examples showed signs of adequate scaling, see also most recent comment for S07, SM255, SM26), and I am pleased with the combination of svgutils
and cairosvg
. I am not sure if the problem exists outside of Jupyter or IDEs, but I tried my hand at suppressing the matplotlib
output with no luck.. and the output is a plot which does not contain the Mol object image, so it is annoying. I will look more into this.
EDIT: As a forewarning the changed files here are littered with comments and bad practices, so I will try to clean those up before you get a chance to review. This one required changes on more levels, so PR229 and PR242 are more complete and take precedence.
@orbeckst
For the plots, SVG to PDF no longer uses temporary files, and the one/two solvent plotting issue has been resolved. If everything looks good I will start cleaning things up and finishing the docs and tests.
EDIT: due to some of the new additions, it might be beneficial to split up some of the new or existing functions into several smaller functions
@orbeckst All requested changes addressed (except for some docs) and new tests developed. Please do a quick review and if everything looks good, then I will do a final cleanup and complete all new documentation.
EDIT: trouble with 3.7 again, see most recent CI
@orbeckst There is some documentation that is lacking or needs attention, but I am hesitant to commit to too much of a docs overhaul since much is anticipated to change in #244.
Is it possible that we can decide what should be minimally documented to avoid too many rewrites?
EDIT: there is much cleaning up and many kwarg consistency/dependency issues I need to fix, so let's reassess the docs situation once I have that sorted out.. sorry for any confusion.
@orbeckst
I did a first pass on updating the docs and kwargs, but I will come back to it with fresh eyes later. I had to put a bandaid on the figdir
situation (commit - 60636b4) so the base
module did not break, but as it is written, figdir
really should be required as a positional argument for this PR. The current fix works pending issue #244.
Docs problems worth mentioning are intersphinx references for MDA select_atoms
and the svgutils
module.. haven't figured those out yet after including in mdpow/doc/sphinx/source/conf.py
.. however the svgutils
docs do not look like they support it, or I am looking in the wrong place. Also, I have not yet mentioned cairosvg
so I need to test that one out in the docs too.
I am still having issues with tests for Python 3.7, so I will come back to this after the weekend.
I would like to finish up this PR as soon as possible and take care of #244 so we can move forward with SAMPL9. If you get a chance to review this over the weekend that would be great, otherwise we can catch up on Monday. Thanks, and have a great weekend!
reminder for myself, write test for the following:
if molname is None:
molname = resname
test_resname = 'foo'
run_analysis(resname=test_resname)
assert molname == test_resname
EDIT: not necessary, expected behavior is explicitly defined and cannot fail
mdpow/doc/sphinx/source/conf.py.. however the svgutils docs do not look like they support it, or I am looking in the wrong place.
See https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html for the intersphinx dict.
Ping me if you have specific questions or if the PR is ready for review.
reminder: check if intersphinx mapping works for cairosvg
, svgutils
, and pypdf
@orbeckst It looks like some of the failed tests are related to RDKit. There may have been an update, so I will look into it.
@orbeckst
I think this might be the cause of four tests that fail starting with Python>=3.9
: RDKit issue
The environment used for the COW analyses includes RDKit 2022.09.1, and the issue was addressed in a build of RDKit version 2023.03.1. The builds appear to be specific to Python versions, and it looks like some cases of index ordering (SMARTS) is the only relevant change causing those tests to fail.
Python=3.7
CI uses the 2022 version, and Python=3.8
uses a build from the 2023 version, but neither fail, so I am unsure how to solve this problem for all possible version combinations. At first glance, based on the atoms/bonds highlighted in our plots for COW analysis, it seems like there are no errors with the 2022 version (just different indices than some 2023 builds I guess).
EDIT: Python=3.10
with RDKit=2022.09.1
, the version combo in the environment I am using, seems to be working for our purposes.
I looked at the test failures but I don't know what is the correct output. Is it possible that our current reference values are incorrect?
We could just require rdkit >=2023 (which is available from 3.8 up) (and then axe 3.7).
I looked at the test failures but I don't know what is the correct output. Is it possible that our current reference values are incorrect?
We could just require rdkit >=2023 (which is available from 3.8 up) (and then axe 3.7).
@orbeckst The issue with this solution is that the 2023 build for 3.8 passes (the builds are version specific), with the same results as the 2022 build for 3.7. The change that causes test failure begins with RDKit 2023, Python >= 3.9.
Another failed test unique to Python=3.7:
______________ TestAutomatedDihedralAnalysis.test_DataFrame_input ______________
self = <mdpow.tests.test_automated_dihedral_analysis.TestAutomatedDihedralAnalysis object at 0x7f854ad789d0>
SM25_tmp_dir = PosixPath('/tmp/pytest-of-runner/pytest-0/test_DataFrame_input0/SM25')
dihedral_data = ( selection solvent interaction lambda time dihedral
0 O1-S4-C6-C10 water Coulomb 0...1000 50000.0 211.129909
15415 S4-O1-C5-C9 water VDW 1000 50000.0 207.238198
[15416 rows x 6 columns])
def test_DataFrame_input(self, SM25_tmp_dir, dihedral_data):
df, _ = dihedral_data
dihedrals.automated_dihedral_analysis(dirname=SM25_tmp_dir, figdir=SM25_tmp_dir,
resname=resname, molname=molname,
solvents=('water',), dataframe=df)
> assert (SM25_tmp_dir / 'SM25' / 'SM25_C10-C5-S4-O11_violins.pdf').exists(), 'PDF file not generated'
E AssertionError: PDF file not generated
E assert False
E + where False = <bound method Path.exists of PosixPath('/tmp/pytest-of-runner/pytest-0/test_DataFrame_input0/SM25/SM25/SM25_C10-C5-S4-O11_violins.pdf')>()
E + where <bound method Path.exists of PosixPath('/tmp/pytest-of-runner/pytest-0/test_DataFrame_input0/SM25/SM25/SM25_C10-C5-S4-O11_violins.pdf')> = ((PosixPath('/tmp/pytest-of-runner/pytest-0/test_DataFrame_input0/SM25') / 'SM25') / 'SM25_C10-C5-S4-O11_violins.pdf').exists
mdpow/tests/test_automated_dihedral_analysis.py:310: AssertionError
I am going to remove 3.7 #248 in a new PR, that's one thing less to worry about.
@cadeduckworth , the MDAnalysis RDKit converter docs https://docs.mdanalysis.org/stable/documentation_pages/converters/RDKit.html#MDAnalysis.converters.RDKit.RDKitConverter state
Changed in version 2.2.0: Improved the accuracy of the converter. Atoms in the resulting molecule now follow the same order as in the AtomGroup.
That's relevant because the Notes stated
The
_MDAnalysis_index
property of the resulting molecule corresponds to the index of the specific AtomGroup that was converted, which may not always match the index property.
I think this says that for MDA >= 2.2, we can assume that MDA atom indices and RDKIT mol indices agree.
@cadeduckworth I merged #249 β please go ahead with your PR! π
reminder: check if intersphinx mapping works for
cairosvg
,svgutils
, andpypdf
To-Do
@orbeckst
The codecov report of 98.80% is misleading; some of my current tests are weak at best, and some are possibly not testing anything at all. Once the new SVG methods are finalized, more specific tests need to be written, existing tests need to be revisited, and fixture scopes need to be optimized. Logger/error info needs to be added or updated. Once PR229 is merged the
test_workflows_base.py
module needs to test iterative usage of new plotting functionality for multiple projects. Docs will need a thorough rewrite. Tutorials and examples will need to be revisited.The 'plotting only one solvent issue' needs to be revisited (see most recent comments, SM25 & SM26 pdfs). The current method almost works, but results in an unnecessary solvent label in the legend. I have already already come up with a way to probably fix this but have not tested it yet.
The plots look nice, they seem scalable (a few examples showed signs of adequate scaling, see also most recent comment for S07, SM255, SM26), and I am pleased with the combination of
svgutils
andcairosvg
. I am not sure if the problem exists outside of Jupyter or IDEs, but I tried my hand at suppressing thematplotlib
output with no luck.. and the output is a plot which does not contain the Mol object image, so it is annoying. I will look more into this.EDIT: As a forewarning the changed files here are littered with comments and bad practices, so I will try to clean those up before you get a chance to review. This one required changes on more levels, so PR229 and PR242 are more complete and take precedence.
To-Do
To-Do
solute.unwrap
if not neededNew software requirements:
@orbeckst
Please see lines 143 and 144 (dihedrals.build_universe
). If MDPOW supports single solvent Pow calculations (does it?), should we not always assume that results will be available for water by default?
EDIT: Maybe it is best to default to using the topology and trajectory from the first solvent specified?
It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.
However, just raise an issue that we improve build_universe()
to be smarter about finding the topology. Right now it's working for everything that we need.
Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.
Actually, your idea to use the first solvent seems pretty good and easy enough to implement. You could even make solvent='water'
a kwarg and then the default is the current behavior.
It supports single solvent solvation free calculations, yes. You're right, the water directory does not necessarily exist.
However, just raise an issue that we improve
build_universe()
to be smarter about finding the topology. Right now it's working for everything that we need.Let's not use the ITP file because that might not be present, e.g., when someone else ran just the FEP on a cluster.
ITP file?
ITP file?
I was thinking along what I used for the dihedral sampler: use the initial files in the top level directories, including the topology file, but on second thought it makes no sense ebcause that files does not contain the solvent and is not useful for using with the trajectory. The TPR is really what we want.
Sorry to have confused you.
Docs say hue_order
takes a list of strings, so we will do the list conversion as part of the kwarg.
EDIT: I think hue_order=["water", "octanol"]
was the original kwarg from the dihedral violins code you provided me with to start developing the figure functions a long while ago.
@orbeckst Ready for review, all requested changes resolved or responded to. I raised several new issues and commented or updated some existing issues to document what needs to be done soon, but is not immediately necessary for merging this PR. Minor reviewing left to do on my part tomorrow as a sanity check, but I hope to get this merged and resubmit the SAMPL9 jobs Monday afternoon.
A momentous occasion: PR #243 has been merged. π π
Congratulations!