Open RiesBen opened 2 months ago
Hello @RiesBen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
src/kartograf/atom_mapper.py
:Line 857:80: E501 line too long (103 > 79 characters) Line 859:80: E501 line too long (92 > 79 characters) Line 860:80: E501 line too long (93 > 79 characters) Line 863:80: E501 line too long (98 > 79 characters) Line 877:80: E501 line too long (82 > 79 characters) Line 883:80: E501 line too long (86 > 79 characters) Line 884:80: E501 line too long (86 > 79 characters) Line 917:80: E501 line too long (82 > 79 characters) Line 919:80: E501 line too long (95 > 79 characters) Line 927:80: E501 line too long (92 > 79 characters) Line 928:80: E501 line too long (104 > 79 characters) Line 929:80: E501 line too long (88 > 79 characters) Line 932:80: E501 line too long (85 > 79 characters) Line 937:80: E501 line too long (83 > 79 characters)
src/kartograf/tests/conftest.py
:Line 116:80: E501 line too long (89 > 79 characters) Line 123:80: E501 line too long (84 > 79 characters) Line 125:80: E501 line too long (90 > 79 characters)
src/kartograf/tests/test_atom_mapper.py
:Line 285:80: E501 line too long (94 > 79 characters) Line 290:80: E501 line too long (100 > 79 characters) Line 300:80: E501 line too long (97 > 79 characters) Line 303:80: E501 line too long (84 > 79 characters) Line 307:80: E501 line too long (95 > 79 characters) Line 308:1: W293 blank line contains whitespace Line 315:80: E501 line too long (93 > 79 characters) Line 327:80: E501 line too long (96 > 79 characters) Line 335:80: E501 line too long (97 > 79 characters)
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 97.53%. Comparing base (
8ebfea7
) to head (00709d5
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚨 Try these New Features:
I've also realized that the way we are splitting by chains might not doing what we want, I think we would like to go with a similar approach to what the gufe.ProteinComponent._from_openMMPDBFile
is doing, but instead using the chain atoms instead of the topology atoms.
For example, the structure of TYK2 in PLB repo has two waters (6 atoms in total), and if you check this approach you have something like the following:
In [4]: tyk2_comp = ProteinComponent.from_pdb_file(f"{tyk2_basepath}/protein.pdb")
In [5]: tyk2_rdmol = tyk2_comp.to_rdkit()
In [6]: tyk2_rdmol.GetNumAtoms()
Out[6]: 4658
In [7]: mapper = KartografAtomMapper(atom_map_hydrogens=True)
In [8]: chains = mapper._split_protein_component_chains(tyk2_comp)
In [9]: chains
Out[9]: [ProteinComponent(name=0_A), ProteinComponent(name=1_A)]
In [10]: chains[1].to_rdkit().GetNumAtoms()
Out[10]: 4
In [11]: chains[0].to_rdkit().GetNumAtoms()
Out[11]: 4652
So there are some missing atoms in the waters when using this function to split the components by chain.
It seems that there is a different behavior for importlib.resources.file
in python 3.9. That's why the tests are failing. I couldn't spot anything about this in the changelog for 3.10, though.
@RiesBen - having @hannahbaumann @jthorton take over the review of this PR might be a good handover exercise. This seems like the type of thing that would expose folks to most of the Kartograf functionality.
One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?
One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?
Yeah I think it's reasonable to expect the two input molecules to be of the same type.
This PR tries to solve the raised issue with multi chain components. see #46