Closed anton-bushuiev closed 1 year ago
Hey @anton-bushuiev, thanks for this! :grin:
Completely agree these should be separated & nice job on implementing the max occupancy selection strategy. That had been on my mind.
And, yes, I think setting insertions to True
by default makes sense.
One comment is that we can make the altloc selection param configurable. What do you think? This could change the alt_loc param to Union[str, bool]
.
Sounds good, @a-r-j. What options do you have on your mind? I can only come up with max_occupancy
, min_occupancy
and first
, last
as in remove_insertions
.
Sounds good, @a-r-j. What options do you have on your mind? I can only come up with
max_occupancy
,min_occupancy
andfirst
,last
as inremove_insertions
.
Yep, I think those will cover it. Plus "exclude"?
@a-r-j , what do we expect when alt_locs=True
? To leave all of them? Then, we need some node naming convention to distinguish them as separate nodes.
Hmm. Good point. I think pure literals is the way to go "all"
, "none"
,"first"
,"max_occupancy"
,... etc.
For naming scheme, good point. What about appending :altX
to node names where X
is the altloc identifier?
I think it's also worth adding an is_insertion
and is_altloc
property to the node metadata.
I actually did it in the similar "pure literals" way but with aliases for bool values. The naming sounds good.
Patch coverage: 49.86
% and project coverage change: +7.15
:tada:
Comparison is base (
8123f42
) 40.27% compared to head (d29ac48
) 47.42%.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@anton-bushuiev Looks like this is good to go. Was there anything else you wanted to add to this PR?
Kudos, SonarCloud Quality Gate passed!
@a-r-j No, thanks for finishing this!
Great, thanks for the contribution! Much appreciated 😁
Reference Issues/PRs
What does this implement/fix? Explain your changes
Hi, @a-r-j 👋! I love the package, so I'm coming with another PR.
Currently, insertions and altlocs are always handled together. This PR separates them, as I believe it makes more sense. Additionally, the removal of altlocs is improved by leaving the locations with the highest occupnacies.
I also think that in ProteinGraphConfig it should be
by default but for now I left
to be consistent with your design. My understanding is that insertion codes are a valid part of structure and their removal leads to a corrupted structure (see https://github.com/a-r-j/graphein/issues/255). Alt_locs, vice versa, should be dropped because they would lead to overlapping atoms. What do you think?
What testing did you do to verify the changes in this PR?
Pull Request Checklist
./CHANGELOG.md
file (if applicable)./graphein/tests/*
directories (if applicable)./notebooks/
(if applicable)python -m py.test tests/
and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,python -m py.test tests/protein/test_graphs.py
)black .
andisort .