FlyBase / GO-curation

For projects related to GO curation in FlyBase
MIT License
0 stars 0 forks source link

Testis-specific subunit calls #95

Closed hattrill closed 3 months ago

hattrill commented 5 months ago

For making complexes, we can sometimes make a call based on HTP expression studies that one may be the "ubiquitiously" expressed complex and the testis-specific.

Can we formalize this to make an FB analysis ref?

example: these potential complex groups: TIM23 LATERAL SORTING COMPLEX
TIM23-PAM TRANLOCATION COMPLEX detailed on mitochondrail transporter tab: https://docs.google.com/spreadsheets/d/1E9wnCajmdMum0X9qCqED74-mRmQS0sOM/edit#gid=1880240961

sjm41 commented 5 months ago

I agree this would be useful. I think we'll need the same thing for curating testis-specific paralogs to testis-specific (metabolic) pathway groups. I think we just need one FB analysis reference that we can use for either use case. OK? Can you make a draft proforma for this and post it here? Will just need to get the wording right for the P18 entry.

hattrill commented 5 months ago

Look at FBrf0240104

hattrill commented 5 months ago

@sjm41 suggested text for FB analysis:

! P16. Title u :FlyBase classification of testis-specific gene groups. ! P18. Miscellaneous comments G :The assignment of genes which exhibit a strong testis-biased expression to 'testis-specific' gene groups based on high-throughput expression data and 'testis tissue specificity index' given in FBrf0240104 (Additional file 2). Testis-specific gene groups may contain a mixture of ubiquitously expressed and testis-specific genes, for example for macromolecular complexes which have alternative paralogous subunits.

hattrill commented 5 months ago

I've made a folder with the file from FBrf0240104 which has a column for a specificity index https://drive.google.com/drive/u/0/folders/1aFbryRa65GrFwlXnBxuLtMlwKhIRMzHL

Could be used to help with calls for testis bias, still worth looking at HTP histograms as well.

sjm41 commented 5 months ago

Looks good. Though I'll suggest a couple of edits:

! P16. Title u :Testis-specific gene groups. ! P18. Miscellaneous comments G :This reference is used to assign genes exhibiting a strong testis-biased expression to 'testis-specific' gene groups (protein complexes or pathways), based on high-throughput expression data in FlyBase and the 'testis tissue specificity index' given in FBrf0240104 (Additional file 2). Testis-specific gene groups may contain a mixture of ubiquitously expressed and testis-specific genes, for example for macromolecular complexes which have alternative paralogous subunits.

I wasn't aware of FBrf0240104! We should do some QC of those indices, and if they look good we could propose a new field in the Gene Report -> Expression section that reports this index. We discussed such a field before, perhaps based on a FB-internal assessment of our HT expression data, but maybe this ready-made index will suffice, at least for the time-being? We'd have to request some HarvDev work (e.g. checking if the gene IDs used in the paper still exist and updating/reporting them as necessary) but shouldn't be too bad.

hattrill commented 5 months ago

I only really had a proper deepdive at FBrf0240104 today and realised that they had done this calculation. I had the same thought that it would be good to put on the gene report. I spot checked a few - they looked quite good. Think that they are a useful indicator and as good/better than anything we can compute ourselves. We could talk about it on Weds.

FBrf0240104 also used gene group data in their analysis!

sjm41 commented 5 months ago

As a sanity check, I added a new spreadsheet here: https://docs.google.com/spreadsheets/d/1yCO6ZrEMMsfGzH5V5ibiTA0PTJ6rZS8UHPBIylNRqQc/edit#gid=0

It shows all the genes mentioned as 'testis-specific' (or similar) in our current gene groups and compares their 'testis-specific index' from FBrf0240104 with that of their paralog(s). The column showing if we have said a gene is 'testis specific' in a FBgg (yes/no) has conditional formatting. The same formatting is then used for the 'testis-specific index', using a cut-off of 4 (since the paper says "Genes with an index higher than 4 are highly enriched in testis "). As expected, there's an almost exact correspondence, indicating that we could implement the 'testis-specific index' scores in FB with confidence.

Note this analysis revealed one gene that needs to move FBgg - see the comment on CG40472 on the sheet.

hattrill commented 5 months ago

@sjm41 CG40472 is a particularly difficult one. It is not testis-specific, but neither is its ortholog ND-AGGG. So, when making this group I assigned both to the MITOCHONDRIAL COMPLEX I & MITOCHONDRIAL COMPLEX I - TESTIS-SPECIFIC VARIANT as there is no evidence that the complex in testis lacks this subunit.

sjm41 commented 5 months ago

Ah, sorry. I missed that both paralogs were in the TESTIS group!

hattrill commented 5 months ago
hattrill commented 3 months ago

All done and loaded now.