Saskia-Oosterbroek / decona

fastq to polished sequenses: pipeline suitable for mixed samples and long (Nanopore) reads
MIT License
41 stars 12 forks source link

Pipeline exits due to empty polished.X.fa.fasta file. #22

Closed thierryjanssens closed 2 years ago

thierryjanssens commented 2 years ago

Dear Saskia,

the pipeline ends prematurely at a step where racon is producing an empty polished fasta file. (Ihave created an issue on the racon github)

Is there a way/plan to incorporate robustness to the pipeline, so that it continues when encountering an error (comparable to a Snakemake format)?

Kind regards,

Thierry

Saskia-Oosterbroek commented 2 years ago

Hi Thierry,

Is this the "error: empty overlap set!", while you are processing relatively short reads (below 300 or 400 bp) by any chance? If that is the case it is a known issue. It can be solved by adapting the kmer size in Minimap (for which the default 15), this will be an incorporated option in Decona's next release.

If you wish to adjust this in the script yourself (so you can use it right now) you can add a -k flag to the minimap commands in the script. minimap2 -ax map-ont ref_"${file}".fasta "${file}" -t "$MULTITHREAD" > align_"${file}".sam
change to minimap2 -ax map-ont -k6 ref_"${file}".fasta "${file}" -t "$MULTITHREAD" > align_"${file}".sam In this case I added -k6 as an example, which works well with shorter reads around 300 bases.

Best, Saskia

thierryjanssens commented 2 years ago

Dear Saskia,

thank you for your swift reply. Maybe your explanation is not valid for this issue:

I have set the lower threshold of read length to 500 bp. Moreover, minimap2 created a non-empty sam file.

I ran racon on the input files. It is racon that creates an empty polished file.

$ racon -m 8 -x -6 -g -8 -w 500 -t 8 11-3901.fa align_11-3901.fa.sam ref_11-3901.fa.fasta > polished_11-3901.fa.fasta [racon::Polisher::initialize] loaded target sequences 0.000091 s [racon::Polisher::initialize] loaded sequences 0.000269 s [racon::Polisher::initialize] loaded overlaps 0.000127 s [racon::Polisher::initialize] aligned overlaps 0.000444 s [racon::Polisher::initialize] transformed data into windows 0.000014 s [racon::Polisher::polish] generated consensus 0.000069 s [racon::Polisher::] total = 0.001162 s

However the output is empty.

These are the size of the respective file:

$ ls -lshrt 3901 20K -rw-rw-r-- 1 minion minion 20K dec 13 08:23 11-3901.fa 4,0K -rw-rw-r-- 1 minion minion 814 dec 13 21:40 ref_11-3901.fa.fasta 20K -rw-rw-r-- 1 minion minion 20K dec 13 21:40 align_11-3901.fa.sam 0 -rw-rw-r-- 1 minion minion 0 dec 14 10:47 polished_11-3901.fa.fasta

Kind regards, Thierry