a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 129 forks source link

Correct the coordinates computation for small residue graohs, and when granularity: centroids #220

Closed manonreau closed 1 year ago

manonreau commented 1 year ago

Reference Issues/PRs

Fixes #219

What does this implement/fix? Explain your changes

new_edge_funcs = {"edge_construction_functions": [add_peptide_bonds, add_aromatic_interactions, add_hydrogen_bond_interactions, add_disulfide_interactions, add_ionic_interactions, add_aromatic_sulphur_interactions, add_cation_pi_interactions, partial(add_distance_threshold, long_interaction_threshold=2, threshold=20.)]}

params_to_change = {"granularity": "centroids"} config = ProteinGraphConfig(new_edge_funcs, params_to_change)

pdb_name = 'xxxx.pdb' g = construct_graph(config=config, pdb_path=pdb_name) g.coords



#### Pull Request Checklist
- [x] Added a note about the modification or contribution to the `./CHANGELOG.md` file (if applicable)
- [x] Added appropriate unit test functions in the `./graphein/tests/*` directories (if applicable)
- [x] Modify documentation in the corresponding Jupyter Notebook under `./notebooks/` (if applicable)
- [x] Ran `python -m py.test tests/` and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., `python -m py.test tests/protein/test_graphs.py`)
- [x] Checked for style issues by running `black .` and `isort .`
sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
12.5% 12.5% Duplication

codecov-commenter commented 1 year ago

Codecov Report

Base: 40.27% // Head: 47.76% // Increases project coverage by +7.49% :tada:

Coverage data is based on head (3eb36cf) compared to base (8123f42). Patch coverage: 51.86% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #220 +/- ## ========================================== + Coverage 40.27% 47.76% +7.49% ========================================== Files 48 85 +37 Lines 2811 5435 +2624 ========================================== + Hits 1132 2596 +1464 - Misses 1679 2839 +1160 ``` | [Impacted Files](https://codecov.io/gh/a-r-j/graphein/pull/220?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb) | Coverage Δ | | |---|---|---| | [graphein/grn/parse\_trrust.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vZ3JuL3BhcnNlX3RycnVzdC5weQ==) | `37.77% <ø> (ø)` | | | [graphein/ml/diffusion.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vbWwvZGlmZnVzaW9uLnB5) | `0.00% <0.00%> (ø)` | | | [graphein/ppi/edges.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL2VkZ2VzLnB5) | `100.00% <ø> (ø)` | | | [graphein/ppi/graph\_metadata.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL2dyYXBoX21ldGFkYXRhLnB5) | `0.00% <ø> (ø)` | | | [graphein/ppi/graphs.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL2dyYXBocy5weQ==) | `54.34% <ø> (ø)` | | | [graphein/ppi/parse\_biogrid.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL3BhcnNlX2Jpb2dyaWQucHk=) | `75.00% <ø> (ø)` | | | [graphein/ppi/visualisation.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHBpL3Zpc3VhbGlzYXRpb24ucHk=) | `0.00% <0.00%> (ø)` | | | [graphein/protein/analysis.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9hbmFseXNpcy5weQ==) | `0.00% <0.00%> (ø)` | | | [graphein/protein/edges/intramolecular.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9lZGdlcy9pbnRyYW1vbGVjdWxhci5weQ==) | `22.68% <0.00%> (ø)` | | | [graphein/protein/features/sequence/utils.py](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb#diff-Z3JhcGhlaW4vcHJvdGVpbi9mZWF0dXJlcy9zZXF1ZW5jZS91dGlscy5weQ==) | `28.00% <0.00%> (+3.00%)` | :arrow_up: | | ... and [77 more](https://codecov.io/gh/a-r-j/graphein/pull/220/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Arian+Jamasb)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

helder-ribeiro commented 4 months ago

Hi all! Just a comment regarding the updates in the atoms.groupby called inside the centroid calculations. It seems the input list starts with residue_number which makes the centroid data.frame being ordered by the residue_number. In the case we are working with two protein chains, the rows of this data.frame will be different from the G.graph["pdb_df"] called, for instance, inside the add_distance_threshold. Thus, the edges are created between wrong nodes. It seems that just changing the order of the input list for atoms.groupby (starting for instance by chaid_id) could correct this issue, but not sure about downstream impacts of this modification.