Open dufaultc opened 2 years ago
Current implementation in graph.py queries uniprot based on the StringDB id. Validation/compatibility with co-expression data still needs to be tested.
Source of data: uniprot.org using bioservices Where data is located: currently generated during runtime of graph creation Description of data: FASTA sequence is pulled from uniprot and further processed into a tokenizable form for ProtBert.
From uniprot.org, will be to initialize node vectors of our graph neural network.
Information Needed