UCL / HHyeast-server

0 stars 0 forks source link

ORFs file "does not exist" in HHyeast, even though the file exists - may overlap with previous issue #64

Open timlevine opened 5 years ago

timlevine commented 5 years ago

I found that the file of all HHyeast hits (HHpry_hits_171010) hd nothing from a particular ORF in it. Also that ORF is shown by HHyeast as "does not exist" Yet I have the file which was made by HHyeast - see attached. This has 34 Pram hits with pSS≥50%


Maybe this issue overlaps with the previous one!

Was this excluded because of the 2nd hit: No 2

PF16507.4 ; BLM10_mid ; Proteasome-substrate-size regulator, mid region Probab=100.00 E-value=6e-73 Score=745.17 Aligned_cols=512 Identities=46% Similarity=0.855 Sum_probs=459.9 Template_Neff=8.400

This is very strong (pSS=100%), full length and contains the name of the protein.

timlevine commented 5 years ago

Further to this: I looked for ORFs that have Pfam hits in the InterPro list used by SGD. There are 4964. Of these 87 have no Pfam hit in HHyeast

These fall into three classes:

1) hits too short (n=26): all hits less than 30 aligned columns

2) pseudogenes (n=3) - no confidence in gene's reality at SGD. For one of these we have a file, but it's OK to not show any of these results.

3) file dropped out for no clear reason (n=58 - list below): the files have been made for all of these in the HHyeast dataset, and the website know the ORFs exist, i.e. when I put in the systematic name (e.g. YAL008W) the website uses the non-systematic name (e.g. Fun14) to tell me that files do not exist.

This final 1% of files (some of which have gaps/inserts files) should be put back in the HHyeast data set! List here:

Q0050 YAL008W YAL026C YAR019C YBL088C YBL099W YBR084W YBR276C YCL011C YDL040C YDR145W YDR359C YDR406W YDR422C YDR484W YEL060C YER132C YER164W YFL007W YGL062W YGL086W YGL097W YGL100W YGL197W YGR003W YGR061C YGR199W YGR204W YGR240C YGR245C YGR281W YIL147C YJR066W YKL164C YKL188C YKL203C YKL209C YLL001W YLR096W YLR163C YLR398C YMR176W YMR284W YMR296C YNL163C YNR033W YNR047W YOL021C YOL123W YOL152W YOR116C YOR162C YOR251C YOR343W-B YOR393W YPL226W YPL249C YPR032W

tamuri commented 5 years ago

We have results for all in that list. I think this is related to #6. Will fix now.

tamuri commented 5 years ago

I've uploaded those missing data files - you should be able to see them on the website now.

tamuri commented 5 years ago

PF16507.4 ; BLM10_mid ; Proteasome-substrate-size regulator, mid region Probab=100.00 E-value=6e-73 Score=745.17 Aligned_cols=512 Identities=46% Similarity=0.855 Sum_probs=459.9 Template_Neff=8.400

This is very strong (pSS=100%), full length and contains the name of the protein.

Any Pfam hits that contain the name of the protein are not shown.

timlevine commented 5 years ago

Great - thanks for putting up those 58 T