bcgsc / transabyss

de novo assembly of RNA-seq data using ABySS
Other
34 stars 14 forks source link

[1.5.2] fails with blat 35 #3

Closed wookietreiber closed 9 years ago

wookietreiber commented 9 years ago

I just installed and tried to run the sample assembly, however, blat 35 fails with an internal error:

$ bash sample_dataset/assemble.sh
...
CHECKPOINT: De Bruijn graph assembly completed.
Iteration 1 of graph simplification ...
ADJ: 42 vertices, 48 edges
Walked 3 paths and marked 8 vertices for removal.
CMD: bash -euo pipefail -c 'MergeContigs --kmer=32 --out=/tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-unitigs.r1.ref.fa /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-1.fa /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-1.adj /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-unitigs.r1.ref.path'
The minimum coverage of single-end contigs is 1.
The minimum coverage of merged contigs is 1.
Internal error genoFind.c 2250Internal error genoFind.c 2250

CMD: bash -euo pipefail -c 'blat -noHead -t=dna -q=dna -out=psl -tileSize=18 -maxGap=1 -maxIntron=1 -minScore=638 /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-unitigs.r1.ref.fa.1 /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-unitigs.r1.ref.fa.1 >(/tmp/transabyss-1.5.2/bin/skip_psl_self.awk > /tmp/transabyss-1.5.2/sample_dataset/test/assembly/test-unitigs.r1.ref.fa.selfalign.psl.0) >&2'
ERROR: CMD ended with status code 255
Internal error genoFind.c 2250Internal error genoFind.c 2250
wookietreiber commented 9 years ago

Given the line number and C source file name, I took a look at it. The function where the internal error gets raised is:

int gfDefaultRepMatch(int tileSize, int stepSize, boolean protTiles);

Where mainly the tileSize argument is checked. Don't know if that helps.

wookietreiber commented 9 years ago

Given your blat command line (from the output above), the source code and this thread it appears to be clear that your -tileSize=18 command line argument to blat exceeds the allowed value.

wookietreiber commented 9 years ago

As I searched for -tileSize=18 in the source code I found out that you have hard-coded this value. This renders transabyss 1.5.2 completely incompatible with blat 35, which is the most recent blat release.

From the blat command line help it says:

   -tileSize=N sets the size of match that triggers an alignment.  
               Usually between 8 and 12
               Default is 11 for DNA and 5 for protein.

Since we are working with DNA (-t=dna -q=dna), is the default tile size of 11 for DNA sensible to you?

I guess, the options are:

  1. Pick another hard-coded value (instead of 18) that is within the bounds of the most recent blat, or, to allow compatibility to older blat releases, make the hard-coded value dependent on the blat version (contained by the first output line when you just type blat at the command line prompt), e.g.:

    $ blat | head -1
    blat - Standalone BLAT v. 35 fast sequence search command line tool
  2. Make the tile size an option to transabyss itself, so users can set it themselves however they want to. As for the default value to this command line option, see 1.
  3. Leave it at the default, i.e. remove the hard-coded -tileSize=18.
wookietreiber commented 9 years ago

By the way, just by removing the -tileSize=18 option the sample runs fine without complaints.

kmnip commented 9 years ago

Thanks for reporting this!! The -tileSize option was used mainly for speeding up the blat alignment process, but I agree with you about not hardcoding it.

On Fri, Jan 30, 2015 at 12:29 AM, Christian Krause <notifications@github.com

wrote:

By the way, just by removing the -tileSize=18 option the sample runs fine without complaints.

— Reply to this email directly or view it on GitHub https://github.com/bcgsc/transabyss/issues/3#issuecomment-72169039.

kmnip commented 9 years ago

Fixed with ae83072.