PyEED / pyeed

🧬 Toolkit to create, annotate, and analyze specialized sequence databases
https://pyeed.github.io/pyeed/
MIT License
3 stars 5 forks source link

Extension of data model - References and Substrates #27

Open gfeuerriegel opened 1 year ago

gfeuerriegel commented 1 year ago

I think we should add information on publications and substrates of each enzyme to the data model. I suggest adding a table for publications with columns like "doi", "author" and "year" and connecting the table via a junction table (eg. "id", "protein_id", "publication_id") to the ProteinInfo table. The same goes for a substrate table and a junction table. Personally I only need the name and abbreviation of each substrate so far.

haeussma commented 1 year ago

Do you think we should add a property called substrate to the ProteinInfo object? Where is the information of an enzyme's substrate coming from? The information might either come from publication or from a not published experiment?

gfeuerriegel commented 1 year ago

I think that would be an option but how would this property look like? Does it return a list of substrates in the case of promiscuous enzymes? In my case, I especially need the substrates because I am working with different plastics of course, but so far do not include other non-plastic substrates. So far, this information on substrates came from publications only, as non published data is not included in the database.