FlyBase / GO-curation

For projects related to GO curation in FlyBase
MIT License
0 stars 0 forks source link

TM and SP info/searches #107

Open hattrill opened 4 months ago

hattrill commented 4 months ago

2x helpmails: about finding protein sequences related to secreted and TMs: look into how we can make this available via FB.

FB Help Mailer: 2927 I want to find a list of all secreted factors I want to find a list of all secreted factors. When I search on the main page, I cannot find any list. Do you have any suggestions on how to look for these genes?

Thank you for contacting FlyBase. Currently, it is not possible to retrieve a list of all secreted factors directly from FlyBase. We are trying to find a way to generate such type of lists. In the meantime, you could use UniProt to retrieve secreted proteins from Drosophila using the following query:

(proteome:UP000000803) AND (ft_signal:) NOT (ft_transmem:)

This will retrieve all the proteins that have a peptide signal that has been either manually annotated or predicted using the signalP prediction tool. This list could be used as a starting point. It’s likely not exhaustive as sometime predicting signal peptide doesn’t work for some proteins. From the UniPort results page, you can download a list of the UniProt ID and use it to recover the corresponding genes in FlyBase. Just bear in mind, that UniProt can provide multiple IDs for one gene as the isoforms have their own entry while unreviewed which leads to duplicate ID in FlyBase.

This is the link to the result query mentioned above (I have customised the columns to show the FlyBase ID and subcellular location):

https://www.uniprot.org/uniprotkb?fields=accession%2Creviewed%2Cid%2Cgene_names%2Cxref_flybase%2Ccc_subcellular_location&query=%28proteome%3AUP000000803%29+AND+%28ft_signal%3A*%29+NOT+%28ft_transmem%3A*%29&view=table

Re: FB Help Mailer: 2931 genes encoding transmembrane proteins

Thank you for contacting FlyBase. Currently, it is not possible to retrieve a list of all the transmembrane proteins directly from FlyBase. We are trying to find a way to generate such type of lists. In the meantime, you could use UniProt to retrieve them from Drosophila using the following query:

(xref:proteomes-UP000000803) AND (ft_transmem:*)

This will retrieve all the proteins that have at least one transmembrane domain that has been either manually annotated or predicted using the TMHMM prediction tool. This list could be used as a starting point. It’s likely not exhaustive as sometime prediction doesn’t work for some proteins. From the UniPort results page, you can download a list of the UniProt ID and use it to recover the corresponding genes in FlyBase. Just bear in mind, that UniProt can provide multiple IDs for one gene as the isoforms have their own entry while unreviewed which leads to duplicate ID in FlyBase.

This is the link to the result query mentioned above (I have customised the columns to show the FlyBase ID and the transmembrane domain position(s)):

https://www.uniprot.org/uniprotkb?fields=accession%2Creviewed%2Cid%2Cgene_names%2Cxref_flybase%2Cft_transmem&query=%28xref%3Aproteomes-UP000000803%29+AND+%28ft_transmem%3A*%29&view=table