UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP
http://cath-tools.readthedocs.io
GNU General Public License v3.0
57 stars 14 forks source link

Can cath-resolve-hits be used to merge InterProScan results? #70

Open xunsheng opened 5 years ago

xunsheng commented 5 years ago

Thanks for providing this awesome tool! As we know InterProScan results contain evalues for multiple domain identification programs, can we use cath-resolve-hits to merge based on their evalues? Thanks.

tonyelewis commented 5 years ago

Thanks for using CRH and for getting in touch. We're glad to hear that you're pleased with it.

At present, CRH expects input data in either HMMER or "raw" format with either scores or evalues ( https://cath-tools.readthedocs.io/en/latest/tools/cath-resolve-hits/#getting-started ). If you want to combine scores from different programs, it's probably best to convert your data into the raw format.

In principle, there should be no problems with such data containing hits from different sources (such as different programs). Some possible issues…

Have I understood your query correctly? Does this address your query?

We've previously considered adding better support from an optional input field that allows the user to specify a source/category for each entry and then:

Would a potential feature like that map closely to what you want here?

xunsheng commented 5 years ago

Thank you so much for the prompt response. Yes, your answer clears everything up.
The InterProScan integrated more than 10 HMM-based protein domain prediction programs, and the results contain the unique domain ID. It's easy to guess their source based on the domain ID. Sometimes use multiple sources is because not one database could cover all the conservative domains, and not all domains have an annotated name or description of their functions.

Yes, the potential feature will be awesome! CRH is the best tool so far I can find to perform domain reduction based on scores/evalues and overlaps, which is much better than a script to solve only overlaps. A further thought is the experts with the right background could look into the 14 tools integrated by InterProScan, and give suggested weighting based on their algorithms. Thanks again for the nice work!