a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 131 forks source link

Improve altloc handling #263

Closed anton-bushuiev closed 1 year ago

anton-bushuiev commented 1 year ago

Reference Issues/PRs

What does this implement/fix? Explain your changes

Hi, @a-r-j 👋! I love the package, so I'm coming with another PR.

Currently, insertions and altlocs are always handled together. This PR separates them, as I believe it makes more sense. Additionally, the removal of altlocs is improved by leaving the locations with the highest occupnacies.

I also think that in ProteinGraphConfig it should be

insertions: bool = True
alt_locs: bool = False

by default but for now I left

insertions: bool = False
alt_locs: bool = False

to be consistent with your design. My understanding is that insertion codes are a valid part of structure and their removal leads to a corrupted structure (see https://github.com/a-r-j/graphein/issues/255). Alt_locs, vice versa, should be dropped because they would lead to overlapping atoms. What do you think?

What testing did you do to verify the changes in this PR?

Pull Request Checklist

a-r-j commented 1 year ago

Hey @anton-bushuiev, thanks for this! :grin:

Completely agree these should be separated & nice job on implementing the max occupancy selection strategy. That had been on my mind.

And, yes, I think setting insertions to True by default makes sense.

One comment is that we can make the altloc selection param configurable. What do you think? This could change the alt_loc param to Union[str, bool].

anton-bushuiev commented 1 year ago

Sounds good, @a-r-j. What options do you have on your mind? I can only come up with max_occupancy, min_occupancy and first, last as in remove_insertions.

a-r-j commented 1 year ago

Sounds good, @a-r-j. What options do you have on your mind? I can only come up with max_occupancy, min_occupancy and first, last as in remove_insertions.

Yep, I think those will cover it. Plus "exclude"?

anton-bushuiev commented 1 year ago

@a-r-j , what do we expect when alt_locs=True? To leave all of them? Then, we need some node naming convention to distinguish them as separate nodes.

a-r-j commented 1 year ago

Hmm. Good point. I think pure literals is the way to go "all", "none","first","max_occupancy",... etc.

For naming scheme, good point. What about appending :altX to node names where X is the altloc identifier?

I think it's also worth adding an is_insertion and is_altloc property to the node metadata.

anton-bushuiev commented 1 year ago

I actually did it in the similar "pure literals" way but with aliases for bool values. The naming sounds good.

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 49.86% and project coverage change: +7.15 :tada:

Comparison is base (8123f42) 40.27% compared to head (d29ac48) 47.42%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #263 +/- ## ========================================== + Coverage 40.27% 47.42% +7.15% ========================================== Files 48 110 +62 Lines 2811 6882 +4071 ========================================== + Hits 1132 3264 +2132 - Misses 1679 3618 +1939 ``` | [Impacted Files](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb) | Coverage Δ | | |---|---|---| | [graphein/ml/diffusion.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vbWwvZGlmZnVzaW9uLnB5) | `0.00% <0.00%> (ø)` | | | [graphein/ml/metrics/\_\_init\_\_.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vbWwvbWV0cmljcy9fX2luaXRfXy5weQ==) | `0.00% <0.00%> (ø)` | | | [graphein/ml/metrics/gdt.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vbWwvbWV0cmljcy9nZHQucHk=) | `0.00% <0.00%> (ø)` | | | [graphein/ml/metrics/tm\_score.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vbWwvbWV0cmljcy90bV9zY29yZS5weQ==) | `0.00% <0.00%> (ø)` | | | [graphein/ppi/graph\_metadata.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL2dyYXBoX21ldGFkYXRhLnB5) | `0.00% <0.00%> (ø)` | | | [graphein/ppi/visualisation.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL3Zpc3VhbGlzYXRpb24ucHk=) | `0.00% <0.00%> (ø)` | | | [graphein/protein/analysis.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9hbmFseXNpcy5weQ==) | `0.00% <0.00%> (ø)` | | | [graphein/protein/features/utils.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9mZWF0dXJlcy91dGlscy5weQ==) | `27.77% <0.00%> (ø)` | | | [graphein/protein/folding\_utils.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9mb2xkaW5nX3V0aWxzLnB5) | `0.00% <0.00%> (ø)` | | | [graphein/protein/tensor/data.py](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi90ZW5zb3IvZGF0YS5weQ==) | `30.37% <ø> (ø)` | | | ... and [92 more](https://codecov.io/gh/a-r-j/graphein/pull/263?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb) | | ... and [2 files with indirect coverage changes](https://codecov.io/gh/a-r-j/graphein/pull/263/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb) Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

a-r-j commented 1 year ago

@anton-bushuiev Looks like this is good to go. Was there anything else you wanted to add to this PR?

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

anton-bushuiev commented 1 year ago

@a-r-j No, thanks for finishing this!

a-r-j commented 1 year ago

Great, thanks for the contribution! Much appreciated 😁