Gabaldonlab / redundans

Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
GNU General Public License v3.0
129 stars 20 forks source link

Gap closing error #107

Closed maxmaronna closed 10 months ago

maxmaronna commented 10 months ago

Hello I am trying to improve my original assembly with redundans, but it seems I am having an issue at the gap close step. I tested this dataset in two different clusters (both running conda version installed a few days ago). Thanks in advance, max

DETAILS-------------------------------------------------------------------------------------------------------------------------

(redundans) max@darwin:~/02-DATASETS/p_mizigama$ redundans.py -v -i *.fq -f out_gapClosed_PLAT63_P_mizigama.fa -o p_mizigama_redundans Options: Namespace(verbose=True, fastq=['P_mizigama_R1_ECR.fq', 'P_mizigama_R2_ECR.fq'], fasta='out_gapClosed_PLAT63_P_mizigama.fa', outdir='p_mizigama_redundans', threads=16, resume=False, log=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>, nocleaning=True, mem=2, tmp='/tmp', identity=0.51, overlap=0.8, minLength=200, minimap2reduce=False, index='4G', noreduction=True, joins=5, linkratio=0.7, limit=0.2, mapq=10, iters=2, noscaffolding=True, usebwa=False, longreads=[], populateScaffolds=False, minimap2scaffold=False, reference='', norearrangements=False, preset='asm5', nogapclosing=True, runmerqury=False, kmer=21)

################################################## [Sun Dec 10 22:38:19 2023] Reduction...

file name genome size contigs heterozygous size [%] heterozygous contigs [%] identity [%] possible joins homozygous size [%] homozygous contigs [%]

p_mizigama_redundans/contigs.fa 428161670 1182530 163432196 38.17 994946 84.14 88.330 0 264729474 61.83 187584 15.86

################################################## [Sun Dec 10 22:50:18 2023] Estimating parameters of libraries... Aligning 52945894 mates per library... Insert size statistics Mates orientation stats FastQ files read length median mean stdev FF FR RF RR P_mizigama_R1_ECR.fq P_mizigama_R2_ECR.fq 144 267 247.87 126.79 14 9932 45 9

################################################## [Sun Dec 10 22:51:05 2023] Scaffolding... iteration 1.1: p_mizigama_redundans/contigs.reduced.fa 187584 264729474 36.591 72741 217023723 2869 587 890883 72857 43979706 pairs. 31984713 passed filtering [72.73%]. 3827503 in different contigs [8.70%]. 6799848 pairs. 4539370 in different contigs [66.76%]. iteration 1.2: p_mizigama_redundans/_sspace.1.1.fa 103647 265637589 36.591 61809 245845845 4976 1221 1843032 79418 43979706 pairs. 33475691 passed filtering [76.12%]. 2394938 in different contigs [5.45%]. 4996207 pairs. 3210211 in different contigs [64.25%].

################################################## [Mon Dec 11 00:32:54 2023] Gap closing... Traceback (most recent call last): File "/home/max/.conda/envs/redundans/bin/redundans.py", line 727, in main() File "/home/max/.conda/envs/redundans/bin/redundans.py", line 718, in main redundans(o.fastq, o.longreads, o.fasta, o.reference, o.outdir, o.mapq, \ File "/home/max/.conda/envs/redundans/bin/redundans.py", line 460, in redundans resume = run_gapclosing(outdir, libraries, outfn, lastOutFn, threads, limit, iters, resume, verbose, log) File "/home/max/.conda/envs/redundans/bin/redundans.py", line 242, in run_gapclosing if not prepare_gapcloser(outdir, configFn, libFs, libRs, orientations, libIS, libISStDev, \ File "/home/max/.conda/envs/redundans/bin/redundans.py", line 221, in prepare_gapcloser fn1, fn2, passed = filter_reads(outdir, fq1, fq2, minlen, maxlen, limit) File "/home/max/.conda/envs/redundans/bin/redundans.py", line 198, in filter_reads i, filtered, orphans = filter_paired(fastq, outfiles, minlen, maxlen, limit, minqual) File "/home/max/.conda/envs/redundans/bin/filterReads.py", line 213, in filter_paired for i, (rec1, rec2) in enumerate(zip(fqparser1, fqparser2), pi+1): File "/home/max/.conda/envs/redundans/bin/filterReads.py", line 147, in rawtrimmer name = name.decode('utf-8') AttributeError: 'str' object has no attribute 'decode'. Did you mean: 'encode'?

Dfupa commented 10 months ago

Hi @maxmaronna ,

Based on your command line (redundans) max@darwin:~/02-DATASETS/p_mizigama$ redundans.py -v -i *.fq -f out_gapClosed_PLAT63_P_mizigama.fa -o p_mizigama_redundans and your log, it seems that the issue is within the module filterReads.py. While converting it from python2 to python3 I had to enforce a conversion from byte stream to string by decoding it to utf-8.

Unfortunately it seems that during my testing I overlooked submitting non gziped fastq libraries. I'll have a look at it, but in the meantime I would suggest to gzip your fastq libraries and resubmit your redundans command. Thanks for letting me know and I hope this will solve the issue!

maxmaronna commented 10 months ago

Hello Diego

I tested the same dataset (now gzipped) and it worked perfectly. Thanks for the advice! Saludos max