McMasterAI / projectx-2021

0 stars 0 forks source link

Protein sequence data #3

Open dufaultc opened 2 years ago

dufaultc commented 2 years ago

From uniprot.org, will be to initialize node vectors of our graph neural network.

Information Needed

DanDiCesare commented 2 years ago

Current implementation in graph.py queries uniprot based on the StringDB id. Validation/compatibility with co-expression data still needs to be tested.

Source of data: uniprot.org using bioservices Where data is located: currently generated during runtime of graph creation Description of data: FASTA sequence is pulled from uniprot and further processed into a tokenizable form for ProtBert.