empty result for big query scaffold

bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.

GNU General Public License v3.0

1.06k stars 182 forks source link

empty result for big query scaffold #375

Open XClaws opened 4 years ago

XClaws commented 4 years ago

Dear colleague,

I tried to use Diamond v2.0.1 to search protein alignments in my genome assembly. I run it on each scaffold respectively, because it is a huge plant genome. The size of several scaffolds are up to hundreds of million base-pairs (Mb). I found out that the search worked well for smaller scaffolds like ~10Mb. However, for those big scaffolds ( >100Mb), the outputs are all empty. I am wondering is there a limit for size of query sequence? Can I still use DIAMOND by altering any options/parameters for my genome? Thank you in advance. I look forward to hearing from you.

Best regards

bbuchfink commented 4 years ago

You should use the --long-reads option for long input sequences. Still, this has only been tested for sequences the length up to bacterial chromosomes. I don't think Diamond will currently work with sequences that long. You should either run gene calling first, or if you must align the DNA then chop it into small, overlapping pieces of 10kb for example.

javiercnav commented 4 years ago

adding to this thread: Is it "--long-reads" or "-long-reads"

I tried: diamond blastx -d nr_dm.dmnd -q contigs.fna --outfmt 100 -o diamond_1contigs.daa --long-reads Error: Invalid option: long-reads

Then I tried: diamond blastx -d nr_dm.dmnd -q contigs.fna --outfmt 100 -o diamond_1contigs.daa -long-reads and seems to run, but I am not weary is not actually doing what it is supposed to.

Before what I describe above I tried: diamond blastx -d nr_dm.dmnd -q contigs.fna -o diamond_contigs -F 15 --range-culling --top 10 -f 100 Error: Invalid option: range-culling

diamond blastx -d nr_dm.dmnd -q contigs.fna -o diamond_contigs -F 15 -range-culling --top 10 -f 100 Error: Invalid option: r

javiercnav commented 4 years ago

I see now that the version of diamond installed from conda is version 0.9.14, and this is what is giving the problem... Is the conda repository updated regularly?

bbuchfink commented 3 years ago

Sorry, it looks like this has escaped me. Yes, conda has the latest version of Diamond.