UCL / HHyeast-server

0 stars 0 forks source link

Front matter for website #48

Open ilectra opened 6 years ago

timlevine commented 6 years ago

I have done a draft. Various numbers I plan to appear here are as yet unavailable.

Some aspect of the way the new version on your computer are not clear to me. For example, will it still not display a panel if there are no hits, or will it instead display an empty panel. In that case, where a user rests the thresholds lower, can the "missing" panel then reappear? (I hoe I've made this at all clear)


From: Ilektra Christidi notifications@github.com Sent: Thursday, July 19, 2018 1:37:04 PM To: UCL/HHyeast-server Cc: Levine, Tim; Assign Subject: Re: [UCL/HHyeast-server] Front matter for website (#48)

Assigned #48https://github.com/UCL/HHyeast-server/issues/48 to @timlevinehttps://github.com/timlevine.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/UCL/HHyeast-server/issues/48#event-1742436649, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AM5sGCHGVJ7DT3KWBYf-8fnOPycKgMV0ks5uIH1wgaJpZM4VWUZ2.

timlevine commented 6 years ago

Welcome to the HHyeast website.

This website offers the results of remote homology searches of the entire genome of the model budding yeast Saccharomyces cerevisiae. The searches have been carried out using the HHsearch package developed by Johanne Soeding and colleagues [1], a tool also known as "HHpred" through its online server [2].

The results show visualisations of the strongest homologies for 100% of 6,713 verified yeast open reading frames (ORFs) in three databases:

  1. PDB (solved structures)
  2. Pfam (curated protein domain families from the European Bioinformatics Institute)
  3. the yeast proteome itself.

For each ORF, the summary of all three that allows you to investigate each set of hits in more detail, re-setting the thresholds etc that allow more or less hits to be displayed.

Data Download As well as being able to save images of the domain displays through the controls in each window, for each ORF the file from which the data has been extracted for the visualisation can be downloaded (suffix ".hhr", opened as a text file).

Gaps between the domains in the visualisations HHyeast has discovered YYY additional hits in the gaps between the domains displayed here [3]. Although these are not in the visualisations, it is clear which ORFs have such hits from the download buttons, which ... insert text here

Job submission Type a gene identifier, either systematic name or standard name if available. The server will offer valid options. Typically three panels will be displayed, one for each of the databases (PDB, Pfam Yeast), but a panel will not be displayed if no hits reach the default threshold for display. Each panel can then be examined in detail allowing re-setting of display thresholds for the other two panels.

NOTES

  1. "Protein homology detection by HMM-HMM comparison". Söding J. Bioinformatics. 2005, updated most recently in "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core." Zimmermann et al., J Mol Biol. 2018
  2. https://toolkit.tuebingen.mpg.de/#/tools/hhpred
  3. "HHyeast reveals XXX new domains in the yeast proteome". Christidi et al., Manuscript in Preparation
ilectra commented 5 years ago

Currently, if there are no hits above probability threshold, no panel appears in the summary view. My plan is that, if with a new probability threshold in the detail view some hits do appear, then the "missing" panel would re-appear in the summary view, like you described. Is this the functionality you'd like?

timlevine commented 5 years ago

I'I'd like something to appear for every protein even if there are no hits above 50% - maybe a message saying you can look in more detail, as you suggest.

How many proteins have zero hits that meet the length criteria?

ilectra commented 5 years ago

ok, I'll try to implement that.

And I've no idea how many proteins have zero hits that meet the length criteria...

ilectra commented 5 years ago

Unfortunately it's not at all obvious how to show an empty plot when there are no hits and then fill it up when a lower threshold is provided - see #53 , noted for future development. I can show a message that no hits are available for this db, but lower probability hits will not be available for those ORF's. I suspect there are not many ORF's like that.

ilectra commented 5 years ago

@timlevine , @tamuri , I'm trying to finalise this. Can I please have some numbers for the XXX's and YYY's, as well as some text for the "Gaps between the domains in the visualisations" section?

timlevine commented 5 years ago

@tamuri , @ilectra ,

So far no time to look for XXX and YYY.

I can re-write this without XXX and YYY if that's going to be better than nothing!

I must admit that this is a problem of my own making. I have not found the time to visit the overall discovery rate of new domains. I have a results file that Asif sent me ("hhrpy_hits_20171010" and similar). It needs some work to reveal what's new in there.

Then there's the domains in the gaps. I have mislaid the file / data I was sent on that.

ilectra commented 5 years ago

That's fine, @timlevine , we can come back and revisit the data and their visualisation when there's more funding/time. Shall we say then that I'll skip any reference to gap analysis, as well as Note 3, this time? And restrict the file downloads to the original (whole genome search) .hhr files?

timlevine commented 5 years ago

Is there a reason we cannot offer the gap downloads? I'd hope we can do that.

So the thing that I need to do is re-write the text to explain where we've got to. Also, would it be OK to have a special download explaining how to unpack the information within the usual download files as well as in the gap files?

It's been a long while and I have forgotten all about where the gap files are and how to download them. If you could give a simpleton's guide on downloading all of them (batch) that would be helpful

timlevine commented 5 years ago

@ilectra First stab - this needs more work to produce a download or more text to go below this to help users understand the HHR files

I have pasted the text back in here and it's lost the formatting - can you extract the changes anyway from the elongated paragraphs? I think that the "Gaps between the domains in the visualisations
" section should be left in and so it needs some more text, but only when we know what the button will look like! T

Welcome to the HHyeast website. This website offers the results of remote homology searches of the entire genome of the model budding yeast Saccharomyces cerevisiae. The searches have been carried out using the HHsearch package developed by Johanne Soeding and colleagues [1], a tool also known as "HHpred" through its online server [2]. The results show visualisations of the strongest homologies for 100% of 6,713 verified yeast open reading frames (ORFs) in three databases:

  1. PDB (solved structures)
  2. Pfam (curated protein domain families from the European Bioinformatics Institute)
  3. the yeast proteome itself.

Entering the name of each ORF alows you either to “Download file” of the HHpred results (see below). Alternatively you can choose “Display plot”, which leads to a visaulisation that summarises the strong hits to the ORF in all three databases (PDB, Pfam and yeast, minus the ORF itself) in three separate boxes. Note that for these strong hits, similar hits are clustered together and only one hit per cluster is displayed. To see more detail than this, you can delve deeper within “Display plot”, by pressing one of the three buttons at the bottom (“Detailed PDB hits” etc.). These show every single hit (multiple per cluster), and provide you with options to visualise more or less hits by re-setting the threshold and the degree of accepted overlap. Once you have chosen new settings for one database, you can then apply this to all three by choosing “Go to summary view”.

Data Download
 As well as being able to save images of the domain displays through the controls in each window, “Download file” for each ORF the file from which the data has been extracted for the visualisation can be downloaded (suffix ".hhr", opened as a text file).

Gaps between the domains in the visualisations
 HHyeast has discovered many additional hits in the gaps between the domains displayed here [3]. Although these are not in the visualisations, it is clear which ORFs have such hits from the download buttons, which ... XXXX

Job submission
 Type a gene identifier, either systematic name or standard name if available. The server will offer valid options. Typically three panels will be displayed, one for each of the databases (PDB, Pfam Yeast), but a panel will not be displayed if no hits reach the default threshold for display. Each panel can then be examined in detail allowing re-setting of display thresholds for the other two panels.

NOTES

  1. "Protein homology detection by HMM-HMM comparison". Söding J. Bioinformatics. 2005, updated most recently in "A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core." Zimmermann et al., J Mol Biol. 2018
  2. https://toolkit.tuebingen.mpg.de/#/tools/hhpred
  3. "HHyeast reveals hundreds of new domains in the yeast proteome". Christidi et al., Manuscript in Preparation
timlevine commented 5 years ago

@ilectra here's a further edit - this time leaving the formatting in place by keep int he text in MS Word Welcome to the HHyeast website.docx

ilectra commented 5 years ago

@timlevine , I decided to split the information between the different views, to display the instructions that are relevant to the specific view, instead of explaining everything in the start page. Can you please have a look and let me know what you think? note: the explanation about the data download and format is what I want to implement, but it's not there yet. It's the next (and final!) item on my list.