Open timlevine opened 5 years ago
Further to this: I looked for ORFs that have Pfam hits in the InterPro list used by SGD. There are 4964. Of these 87 have no Pfam hit in HHyeast
These fall into three classes:
1) hits too short (n=26): all hits less than 30 aligned columns
2) pseudogenes (n=3) - no confidence in gene's reality at SGD. For one of these we have a file, but it's OK to not show any of these results.
3) file dropped out for no clear reason (n=58 - list below): the files have been made for all of these in the HHyeast dataset, and the website know the ORFs exist, i.e. when I put in the systematic name (e.g. YAL008W) the website uses the non-systematic name (e.g. Fun14) to tell me that files do not exist.
This final 1% of files (some of which have gaps/inserts files) should be put back in the HHyeast data set! List here:
Q0050 YAL008W YAL026C YAR019C YBL088C YBL099W YBR084W YBR276C YCL011C YDL040C YDR145W YDR359C YDR406W YDR422C YDR484W YEL060C YER132C YER164W YFL007W YGL062W YGL086W YGL097W YGL100W YGL197W YGR003W YGR061C YGR199W YGR204W YGR240C YGR245C YGR281W YIL147C YJR066W YKL164C YKL188C YKL203C YKL209C YLL001W YLR096W YLR163C YLR398C YMR176W YMR284W YMR296C YNL163C YNR033W YNR047W YOL021C YOL123W YOL152W YOR116C YOR162C YOR251C YOR343W-B YOR393W YPL226W YPL249C YPR032W
We have results for all in that list. I think this is related to #6. Will fix now.
I've uploaded those missing data files - you should be able to see them on the website now.
PF16507.4 ; BLM10_mid ; Proteasome-substrate-size regulator, mid region Probab=100.00 E-value=6e-73 Score=745.17 Aligned_cols=512 Identities=46% Similarity=0.855 Sum_probs=459.9 Template_Neff=8.400
This is very strong (pSS=100%), full length and contains the name of the protein.
Any Pfam hits that contain the name of the protein are not shown.
Great - thanks for putting up those 58 T
I found that the file of all HHyeast hits (HHpry_hits_171010) hd nothing from a particular ORF in it. Also that ORF is shown by HHyeast as "does not exist" Yet I have the file which was made by HHyeast - see attached. This has 34 Pram hits with pSS≥50%
YFL007W.0.ssw11.hhr.txt
Maybe this issue overlaps with the previous one!
Was this excluded because of the 2nd hit: No 2
This is very strong (pSS=100%), full length and contains the name of the protein.