a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.02k stars 129 forks source link

DSSP Calculation always downloads new PDB file #188

Closed kamurani closed 2 years ago

kamurani commented 2 years ago

Not sure if I'm missing something, but if I want to create an amino acid graph and store rsa values in the nodes, I can do the following

config = ProteinGraphConfig(edge_construction_functions=edge_fns, graph_metadata_functions=[rsa], dssp_config=DSSPConfig())
graph_with_rsa = construct_graph(pdb_path=STRUCTURE_PATH+"/Q9Y2X7.pdb", config=config)

This loads the graph from the local PDB file ok, but when the DSSP is calculated, I get an error (I think a function is trying to download from the Protein Data Bank assuming the code given is not UniProt?)

Is there some way to run DSSP on the local file instead of re-downloading it?

Hope this makes sense!

a-r-j commented 2 years ago

Yep, if you set the pdb_dir Arg in the ProteinGraphConfig it should check for the existence of the file:

https://github.com/a-r-j/graphein/pull/172/files

I think there is potential for an even cleaner solution now based on streaming the structure data frames in PDB format. I may pick that up this weekend.

The UniProt error should be resolved in #187

a-r-j commented 2 years ago

How did you get on @cimranm ? Can I close this issue?

kamurani commented 2 years ago

@a-r-j yep, all good! Thanks for your help.