UCL / HHyeast-server

0 stars 0 forks source link

Which strong hits are excluded from HHyeast?? #63

Open timlevine opened 5 years ago

timlevine commented 5 years ago

Many strong Pfam hits are excluded from the display. I need to know the exclusion criteria for that so I can replicate them.

I am guessing (have vague memory) that one criterion is if the name of the protein is in the name of the domain (i.e. we had an algorithm to determine if the domain adds no additional information beyond d the existence of a fungal family). However, I know this cannot be the whole story from looking at the domains on HHyeast pages.

I looked at 20 of the ~1400 Pfam domains from the SGD database which have the gene's names in their own descriptions. Some are included in HHyeast and some are not - I could see no pattern in the few examples I looked at some domains named after their protein.xlsx

If this makes no sense at all, I'm happy to talk about it!

Tim

tamuri commented 5 years ago

Had a look this morning. It's not including hits where the ORF name is anywhere in the Pfam description (provided by the HHsearch DB) but the check is case-sensitive e.g. PF07792 is not hidden in AFI1 because Pfam description is "Afi1".

If you're still happy for these hits to be excluded, we can fix this bug.

timlevine commented 5 years ago

It would be good to have a rule that is unaffected by case. SO the answer here is YES