cboursnell / crb-blast

Conditional Reciprocal Best Blast
40 stars 10 forks source link

Using the length-evalue function for blast best hits (BH) #13

Open 000generic opened 6 years ago

000generic commented 6 years ago

Hi!

crb-blast seems to work really great - I'm finding 17-39% improvement over blastp (both with default evalue cutoffs of 10-5) in identifying RBHs between de novo squid and pygmy squid transcriptomes vs a set of animal genomes (see attached).

To annotate my transcriptomes I would like to use in order of preference:

RBH (reciprocal best hit )to human, fish, fly, worm, oyster, snail, octopus, anemone BH (best hit) to human, fish, fly, worm, oyster, snail, octopus, anemone NH (no hit) to human, fish, fly, worm, oyster, snail, octopus, anemone

with human preferred as the annotation species over fish over fly over worm etc.

Given that the function produced by crb-blast provides greater sensitivity over a flat cutoff, I was wondering if its possible for me to extract the evalue for a given length from the files already produced. And/or could a future version of crb-blast produce additional BH files for ( query -> target ) and ( target -> query ). This would expand the number of genes that could be annotated - and I can imagine could have other uses.

Thank-you, Eric

RBH methods - squid and pygmy squid.pdf

blahah commented 4 years ago

Hey @000generic, the information you want should be in the two files evalues_data (output here) and fitting_data (output here) in the output directory. Let me know if you need more help using that information.