alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

Hybrid Assembly: Pacbio+Nanopore+ Illumina / Flye Assembly / Masurca 3.3.7 #157

Open zeyak opened 4 years ago

zeyak commented 4 years ago

Hi, I'm trying to do de novo assembly of a unicellular eukaryote genome that has an average size of 114 Mbp. I've successfully run Masurca 3.3.5 with pacbio + Illumina pair reads. Afterward, I wanted to try hybrid pacbio (28X) + nanopore ( 33X) + Illumina (77X) data with Flye assembly, but since then, I keep getting this errorError correction of PE reads failed. Check pe.cor.log. Any help would be appreciated!!

So my questions are: 1) What could be the reason that I'm getting this error? It did work for Pacbio+Illumina pair reads before, but it doesn't work when I add nanopore #77

[mån 17 feb 2020 18:14:20 +05] Processing pe library reads
[mån 17 feb 2020 18:14:21 +05] Average PE read length 241
[mån 17 feb 2020 18:14:22 +05] Using kmer size of 83 for the graph
[mån 17 feb 2020 18:14:22 +05] MIN_Q_CHAR: 33
[mån 17 feb 2020 18:14:22 +05] Error correct PE
[mån 17 feb 2020 18:14:32 +05] Error correction of PE reads failed. Check pe.cor.log.

2)How can I run a hybrid assembly on Masurca with pacbio+ nanopore + Illumina? Which version should I use to activate Flye?

This is how my Illumina.fastq file looks like:

@M03094:70:000000000-C7RTF:1:1101:22143:1000 NCACAAATCTAGTGTCGCAACCGTTTTTAATTGGCTTTGATGATAAAAGATGATAACAGATATTCATTTTTGCTAACAAAAGATATTATCAAACTAATTACGAAATTATATTATAATAAAAATAATTATACCAATTCCGCTTCACATCACTTCAAATGAGTTACAGTGTGCAAACAAAGGAGCAGAGTTATGAAATGTCGTTGACCTCAACTCTCGTTTTCAACGTTGTGTTCACGGTAATCGTATTTGTAATTTTCCCGATTGGGCGCAGATACGCACCCGGTATTTTTGCGCCTTAACAACATC +

8ACCGGGGGGGGGGGDFGGEGGGGGGGFFGC9FFGGGGFGGGGGGGFEGGGGGGGGGGGGGGGGGGFGFGGGGGGGGGGGGGFCEGGGCFFGGGCGFGGGFFFFGGGGGGGGGGGGGFGGGGGGGGGGGGFGGGCFFGDGGEGGGGFCFGGGGGGGGGGGGGGDFFAEAF@EDFGGGGCFCEGG8=FDCEEDGGGGGGGGGGGGGGGGFGGGFFGGGGDFGFGDFGA>5AFCFF9BF>>FFFFFFFFFFFEFC?7AEFDAA=D2=22@=((7?C@??>>6>696=ECFF>@11:6:2949<<<B2

@M03094:70:000000000-C7RTF:1:1101:17741:1000 NCATATTTTTGCGCTCAAACCCGTTTCCACCCCAATGAAACTAAAATTTGATCATTAACTTAGGATCAAGCTGGAGACACCAAAATTTAACTGGCAAAGTAACTAATTGCTAAATAAATAAGGATAGGCTCCGAAACCGGTATCTTACGTTTGAAATAATCACTATTCAATTAAAACAAGAGTTTCAGAGAAAATGTTGGATGATTTTCTATTCTATAATAGCATTCAATGATACTAGTATGCTAAATTAATATGATGTTCAAAAAGAAGTGAAGCCATTCTGCAATAATGCAACATAAATACAT

This is my configuration file for Pacbio+ Nanopore+ Illumina :

DATA
PE = aa 236 240 run1_R1.fastq.gz run1_R2.fastq.gz
PE = ab 229 232 run2_R1.fastq.gz run2_R2.fastq.gz
PE = ac 235 238 run3_R1.fastq.gz run3_R2.fastq.gz
NANOPORE = pacnano_concatanated.fastq.gz
END

PARAMETERS
GRAPH_KMER_SIZE = auto
USE_LINKING_MATES = 0
CA_PARAMETERS =  cgwErrorRate=0.15
KMER_COUNT_THRESHOLD = 1
NUM_THREADS = 30
JF_SIZE = 1140000000
SOAP_ASSEMBLY = 0
FLYE_ASSEMBLY= 1
END
alekseyzimin commented 4 years ago

Hello, You can concatenate all pacbio and nanopore reads into one file and supply it all as NANOPORE= type. Your configuration file is fine. Please clean up your assembly folder and re-run the assembly from the beginning. --Aleksey

zeyak commented 4 years ago

Hi Aleksey,

Thank you! I've actually solved the problem by installing the Masurca 3.3.5 version without using the conda. Conda was the one blocking the installation somehow, and Masurca 3.3.7 wasn't working either. Once I installed the Masurca 3.3.5 package, everything worked well. I've got pretty nice results on making the hybrid assembly (Pacbio, Illumina, Nanopore) with Masurca.