SamStudio8 / reticulatus

A snakemake-based pipeline for assembling and polishing long genomes from long nanopore reads
MIT License
68 stars 5 forks source link

Rebaled assemblies fail jts_fastmer rule #41

Closed SamStudio8 closed 4 years ago

SamStudio8 commented 4 years ago
/home/prom/miniconda3/envs/reticulatus/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/home/prom/miniconda3/envs/reticulatus/lib/python3.6/site-packages/numpy/core/_methods.py:85: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

I suspect this is caused by chunks of Ns, presumably in the case where no read covers the reference?

SamStudio8 commented 4 years ago

I can't quite tell where the Ns are coming from. The only thing that can generate them is the reverse_complement function of rebaler.

SamStudio8 commented 4 years ago

I was checking the wrong ref set, the runs of N are expected. Not sure why we're getting errors from fastmer. The mystery continues.

SamStudio8 commented 4 years ago

It turns out that the alignments are too short, which implies the assemblies are /too/ noisy out of rebaler. We can force results to be output from fastmer by using `--min-alignment-length 1000'.

SamStudio8 commented 4 years ago

The fastmer scores obtained from a first round of unpolished flye are approximately Q25, whereas rebaler is generating sequences of Q14.

SamStudio8 commented 4 years ago

A quick run through minidot seems to show this is working as expected, I've globally dropped the minimum alignment length for fastmer, we'll see what happens.