bcgsc / tigmint

⛓ Correct misassemblies using linked AND long reads
https://bcgsc.github.io/tigmint/
GNU General Public License v3.0
54 stars 13 forks source link

tigmint parameters with multiple linked reads #35

Closed hidvegin closed 4 years ago

hidvegin commented 4 years ago

I have got a draft genome from 30x PacBio for about 4 Gbp plant genome. I would like to use 60x 10xGenomics linked reads for correcting. I have got 4 library from 10xGenomics. I tried tigmint with dry run. This was the parameters which I used:

tigmint-make arcs -n draft=$HOME/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs reads=$HOME/szeged/fk8jybr/input/Illumina_10x/LC001 $HOME/szeged/fk8jybr/input/Illumina_10x/LC002 $HOME/szeged/fk8jybr/input/Illumina_10x/LC003 $HOME/szeged/fk8jybr/input/Illumina_10x/LC004

I got this output message:

bwa index /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa
bwa mem -t8 -pC /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa /home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.fq.gz | samtools view -u -F4 | samtools sort -@8 -tBX -T$(mktemp -u -t /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam.XXXXXX) -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.bed
samtools faidx /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-cut -p8 -w1000 -n20 -t0 -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.bed
bwa index /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
bwa mem -t8 -pC /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa /home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.fq.gz | samtools view -@8 -h -F4 -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam
arcs -s98 -c5 -l0 -z500 -m4-20000 -d0 -e30000 -r0.05 -v \
    -f /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa \
    -b /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs \
    -g /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.dist.gv \
    --tsv=/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.dist.tsv \
    --barcode-counts=/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam.barcode-counts.tsv \
    /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-arcs-tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs_original.gv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.links.tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
cp /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.links.tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.tigpair_checkpoint.tsv
LINKS -k20 -l10 -t2 -a0.1 -x1 -s /dev/null -f /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa -b /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links
sed -r 's/^>scaffold([^,]*),(.*)/>\1 scaffold\1,\2/' /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.scaffolds.fa >/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.fa
ln -sf /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.tigmint.arcs.fa
make: *** No rule to make target '/home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC002'.  Stop.

How could I set the paramers for use all of 4 linked read libraries?

lcoombe commented 4 years ago

Hi @hidvegin,

Make soft links to your contigs and reads in your current working directory, and then specify the read files without any explicit path (instead of using full paths as you have now) -- that's what Tigmint expects and usually solves this sort of error.

If you various reads are in different files, the easiest thing to do would be to concatenate them into a single, interleaved, gzipped fastq file. If you wanted to, you could alter the barcodes to be specific per library( Ex. Library 1 10x barcodes BX:Z:<barcode>-1, Library 2 BX:Z:<barcode>-2, etc., also mentioned in #33 )

Thank you for your interest in Tigmint! Lauren

hidvegin commented 4 years ago

Dear @lcoombe,

Thank you for your answer. What is the optimal CPU usage for tigmint? Should I use more than 8 CPU? Now, I tried tigmint-make with -t8.

lcoombe commented 4 years ago

Hi @hidvegin - Generally, using more CPU will be better, especially for the alignment stage, so it really just depends on the limitations of your machine.

hidvegin commented 4 years ago

Tigmint can continue a stoped job?

lcoombe commented 4 years ago

If by a stopped job, you mean that it can resume a partial run part-way through, yes. It is based on a Makefile, which is a set of rules that will be executed. If it detects that a file has already been made, it will not re-make that file. If you want to see where it will start again, use the dry-run option -n in the tigmint-make command, which will print out the commands that will be run, without executing them.

hidvegin commented 4 years ago

I tried resume tigmint-makewith 80 CPU but -t paramater was bad. I tried also the tigmint-make --jobs=80, but it seems also not good because bwa mem use only 8 CPU with -t8. How should I add paramater for tigmint-make to use all of the 80 CPU?

lcoombe commented 4 years ago

Since tigmint-make is a Makefile, you specify parameters like this: t=80. See this part of the README for more examples: https://github.com/bcgsc/tigmint#parameters-of-tigmint

hidvegin commented 4 years ago

Thank you. It helped a lot. Now, it is working correctly.

hidvegin commented 4 years ago

Hi @lcoombe,

tigmint-make finished the scaffolding with contigs and unitigs also which generated with canu. In the draft.tigmint.arcs.fa file I found several scaffolds with 1, 2 or 3 bp lenght which was not there in the contigs or unitigs file. The contigs and unitigs have got 1000 bp or larger sequences. How should set the parameters in tigmint-make to prevent this short sequences exist?

lcoombe commented 4 years ago

Hi @hidvegin,

Those small sequences are a product of how tigmint decides on the location of cuts. In short, at a putative misassembly (ie. Tigmint doesn't find any spanning molecules along the sliding window), it is possible that 1 or 2 cuts will be made - if there are two cuts, they are usually quite close to each other and can lead to the small sequence(s). This roughly depends on if it is a blunt misassembly or mediated by a repeat sequence. You could take a look at the methods in the Tigmint paper if you want more detail (there is some pseudocode there that describes how the cut points are decided).

If you don't want any of the small sequences in there, I'd suggest just doing a post-Tigmint step to filter them out (ex. using seqtk seq or a one-liner).

hidvegin commented 4 years ago

Hi @lcoombe,

How could I filter them out with seqtk seq? How could I decide which sequences should I filter them out?

lcoombe commented 4 years ago

Hi @hidvegin,

You could decide on a length threshold that you want for your assembly (call it 'x'), and use this command:

seqtk seq -L x my_fasta.tigmint.fa > my_fasta.tigmint.Lx.fa

They are all valid sequences, so it would be up to you to decide on a length filter approporiate for your particular assembly project.

hidvegin commented 4 years ago

Thank @lcoombe for your answer. I have got a 150x Illumina PE reads from the same plant genome to this scaffold sequences. What is the most suitable tool for correct this draft genome which I generated in tigmint-make arcs? Maybe Sealeror ntEdit?

lcoombe commented 4 years ago

@hidvegin - No worries!

Yes, if it is more polishing and assembly finishing that you are looking for Sealer and ntEdit are good options. Sealer will fill gaps in your existing assembly, and ntEdit performs assembly polishing.

hidvegin commented 4 years ago

Thank you @lcoombe. Should I use the same linked reads also with the Illumina PE reads in Sealer and ntEdit? I have got Illumina PE reads from mRNA-seq. Should I use it to improve the draft genome in Sealer or ntEdit?

lcoombe commented 4 years ago

Hi @hidvegin - Yes, I'd suggest using the same linked reads with Sealer and ntEdit. I'd be more hesitant to use the RNA-seq reads with these tools, since they would be limited to improving the genic space only (vs the genomic reads, which could improve genic + all other regions of the genome assembly)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your interest in Tigmint!