a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.04k stars 132 forks source link

Keep all hetatms for all-atom graphs #415

Open EvanKomp opened 2 weeks ago

EvanKomp commented 2 weeks ago

Is your feature request related to a problem? Please describe. Sort of. My use case includes loading all atom resolution including non protein atoms, and sending them eventually into PtG. I effectively cannot use Graphein as it stands in that I do not know the names of all hetatms in my dataset ahead of time.

Describe the solution you'd like In addition to list of string, accept bool True. Add logic to check that granularity is compatible if necessary.

Describe alternatives you've considered Parsing my whole dataset and sending in an absolutely massive list of hetatms

Thanks for your work.

a-r-j commented 2 weeks ago

Hi @EvanKomp have you seen this API? https://colab.research.google.com/github/a-r-j/graphein/blob/master/notebooks/protein_tensors.ipynb#scrollTo=mpbEZJ4WmyQZ

It's a bit more suited for your use case. In particular, see protein_to_pyg and the keep_hets arg.

EvanKomp commented 2 weeks ago

@a-r-j It looks like that function also requires keep_hets as a list. Is there no way to keep all hetatms without having to specify the names?

a-r-j commented 2 weeks ago

Apologies, my bad. I think the store_het arg will keep everything, while keep_hets is for specifying a subset.