Mapping multi chain components

RiesBen commented 2 months ago

This PR tries to solve the raised issue with multi chain components. see #46

pep8speaks commented 2 months ago

Hello @RiesBen! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file src/kartograf/atom_mapper.py:

Line 857:80: E501 line too long (103 > 79 characters) Line 859:80: E501 line too long (92 > 79 characters) Line 860:80: E501 line too long (93 > 79 characters) Line 863:80: E501 line too long (98 > 79 characters) Line 877:80: E501 line too long (82 > 79 characters) Line 883:80: E501 line too long (86 > 79 characters) Line 884:80: E501 line too long (86 > 79 characters) Line 917:80: E501 line too long (82 > 79 characters) Line 919:80: E501 line too long (95 > 79 characters) Line 927:80: E501 line too long (92 > 79 characters) Line 928:80: E501 line too long (104 > 79 characters) Line 929:80: E501 line too long (88 > 79 characters) Line 932:80: E501 line too long (85 > 79 characters) Line 937:80: E501 line too long (83 > 79 characters)

In the file src/kartograf/tests/conftest.py:

Line 116:80: E501 line too long (89 > 79 characters) Line 123:80: E501 line too long (84 > 79 characters) Line 125:80: E501 line too long (90 > 79 characters)

In the file src/kartograf/tests/test_atom_mapper.py:

Line 285:80: E501 line too long (94 > 79 characters) Line 290:80: E501 line too long (100 > 79 characters) Line 300:80: E501 line too long (97 > 79 characters) Line 303:80: E501 line too long (84 > 79 characters) Line 307:80: E501 line too long (95 > 79 characters) Line 308:1: W293 blank line contains whitespace Line 315:80: E501 line too long (93 > 79 characters) Line 327:80: E501 line too long (96 > 79 characters) Line 335:80: E501 line too long (97 > 79 characters)

Comment last updated at 2024-09-26 02:53:50 UTC

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.53%. Comparing base (8ebfea7) to head (00709d5).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #47 +/- ## ========================================== + Coverage 96.60% 97.53% +0.93% ========================================== Files 13 13 Lines 618 649 +31 ========================================== + Hits 597 633 +36 + Misses 21 16 -5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

ijpulidos commented 2 months ago

I've also realized that the way we are splitting by chains might not doing what we want, I think we would like to go with a similar approach to what the gufe.ProteinComponent._from_openMMPDBFile is doing, but instead using the chain atoms instead of the topology atoms.

For example, the structure of TYK2 in PLB repo has two waters (6 atoms in total), and if you check this approach you have something like the following:

In [4]: tyk2_comp = ProteinComponent.from_pdb_file(f"{tyk2_basepath}/protein.pdb")

In [5]: tyk2_rdmol = tyk2_comp.to_rdkit()

In [6]: tyk2_rdmol.GetNumAtoms()
Out[6]: 4658

In [7]: mapper = KartografAtomMapper(atom_map_hydrogens=True)

In [8]: chains = mapper._split_protein_component_chains(tyk2_comp)

In [9]: chains
Out[9]: [ProteinComponent(name=0_A), ProteinComponent(name=1_A)]

In [10]: chains[1].to_rdkit().GetNumAtoms()
Out[10]: 4

In [11]: chains[0].to_rdkit().GetNumAtoms()
Out[11]: 4652

So there are some missing atoms in the waters when using this function to split the components by chain.

ijpulidos commented 1 month ago

It seems that there is a different behavior for importlib.resources.file in python 3.9. That's why the tests are failing. I couldn't spot anything about this in the changelog for 3.10, though.

IAlibay commented 1 month ago

@RiesBen - having @hannahbaumann @jthorton take over the review of this PR might be a good handover exercise. This seems like the type of thing that would expose folks to most of the Kartograf functionality.

jthorton commented 1 week ago

One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?

IAlibay commented 1 week ago

One last thing to check is whether we want to enforce that both components are of the same type when we try to create the mapping as users have probably made a mistake if they want to map a SMC to a PC as they should not by mutating that many atoms?

Yeah I think it's reasonable to expect the two input molecules to be of the same type.

OpenFreeEnergy / kartograf

Mapping multi chain components #47

Comment last updated at 2024-09-26 02:53:50 UTC

Codecov Report