BeatsonLab-MicrobialGenomics / micropipe

A pipeline for high-quality bacterial genome construction using ONT sequencing
GNU General Public License v3.0
38 stars 9 forks source link

Nextpolish db_split failed #8

Closed joergFLI closed 2 years ago

joergFLI commented 2 years ago

Hi, _nextpolish.log says

INFO: Converting SIF file to temporary sandbox... [INFO] 2022-03-01 02:45:24,637 start... [INFO] 2022-03-01 02:45:24,637 logfile: pid2111778.log.info [WARNING] 2022-03-01 02:45:24,637 Re-write workdir [INFO] 2022-03-01 02:45:24,645 scheduled tasks: [1, 2, 1, 2] [INFO] 2022-03-01 02:45:24,645 options: [INFO] 2022-03-01 02:45:24,645 {'polish_options': ' -p 40', 'rewrite': 1, 'job_prefix': 'nextPolish', 'job_type': 'local', 'cluster_options': '', 'snp_valid': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_valid', 'kmer_count': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.kmer_count', 'sgs_max_depth': '100', 'align_threads': '40', 'sgs_block_size': 91759816L, 'lgs_max_read_len': '150k', 'parallel_jobs': '6', 'multithread_jobs': '40', 'snp_phase': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.snp_phase', 'genome': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/consensus.fasta', 'genome_size': 5505589L, 'workdir': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204', 'cleantmp': 0, 'sgs_align_options': 'bwa mem -p -t 40', 'sgs_unpaired': '0', 'sgs_fofn': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn', 'lgs_polish': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.lgs_polish', 'sgs_use_duplicate_reads': 0, 'score_chain': '/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/%02d.score_chain', 'task': [1, 2, 1, 2], 'lgs_max_depth': '60', 'lgs_block_size': '500M', 'lgs_minimap2_options': '-x map-ont', 'rerun': 3, 'lgs_min_read_len': '1k'} [INFO] 2022-03-01 02:45:24,645 step 0 and task 1 start: [INFO] 2022-03-01 02:45:24,646 analysis tasks done [INFO] 2022-03-01 02:45:24,647 total jobs: 3 [INFO] 2022-03-01 02:45:24,648 Throw jobID:[2111788] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh] in the local_cycle. [INFO] 2022-03-01 02:45:25,149 Throw jobID:[2111845] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split1/nextPolish.sh] in the local_cycle. [INFO] 2022-03-01 02:45:25,651 Throw jobID:[2112009] jobCmd:[/home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split2/nextPolish.sh] in the local_cycle. [ERROR] 2022-03-01 02:45:27,799 db_split failed: please check the following logs: [ERROR] 2022-03-01 02:45:27,799 /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e cat: '01.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part.fasta': No such file or directory cat: '03.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part.fasta': No such file or directory /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/.command.sh: line 12: //: Is a directory

And the log /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0/nextPolish.sh.e says

hostname cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0 cd /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/00.score_chain/01.db_split.sh.work/db_split0 time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn time /opt/NextPolish/bin/seq_split -d /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204 -m 91759816 -n 6 -t 40 -i 1 -s 550558900 -p input.sgspart /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/sgs.fofn Error! /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/110712RA1944_S13_L001_R1_001.fastq.gz does not exist!Command exited with non-zero status 1 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 4164maxresident)k 0inputs+0outputs (0major+132minor)pagefaults 0swaps /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91

However, file 110712RA1944_S13_L001_R1_001.fastq.gz does exist. Its basically a link to the raw data.

Thanks

vmurigneu commented 2 years ago

hi @joergFLI

Can you make sure the previous steps generated the expected output? ie check the files in the folder 2_assembly and 3_polishing_long_reads.

Can you post the content of /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/.command.sh?

Your error looks very similar to this one: https://github.com/Nextomics/NextPolish/issues/88 but you dont seem to have a typo in the file name. In the /work folder, files are expected to be links to raw data.

joergFLI commented 2 years ago

Dear vmurigneu,

for two samples all previous steps worked fine including flye polishedLR files with appropriate genome size. One sample has a file flye_racon_4.fasta but not flye_polishedLR. I cannot find a log for medakka. command.sh looks like this:

!/bin/bash -ue

set +eu ls 110712RA1944_S13_L001_R1_001.fastq.gz 110712RA1944_S13_L001_R2_001.fastq.gz > sgs.fofn echo -e "task = 1212 genome = consensus.fasta sgs_fofn = sgs.fofn multithread_jobs = 40" > nextpolish.cfg nextPolish nextpolish.cfg if [[ "1212" == "1212" ]] || [[ "1212" == "best" ]] ; then cat 01.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part.fasta > 12RA1944_flye_polishedLR_SR_1.fasta cat 03.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part.fasta > 12RA1944_flye_polishedLR_SR_2.fasta // rm -r 00.score_chain 01.kmer_count 02.score_chain 03.kmer_count elif [[ "1212" == "12" ]]; then cat 01.kmer_count/polish.ref.sh.work/polish_genome/genome.nextpolish.part.fasta > 12RA1944_flye_polishedLR_SR_2.fasta // rm -r 00.score_chain 01.kmer_count fi rm input.sgspart.fastq.gz cp .command.log nextpolish.log nextPolish --version 2> nextpolish_version.txt

Yes, I've seen this https://github.com/Nextomics/NextPolish/issues/88 but the file name seems correct and the link also has the correct target.

thomcuddihy commented 2 years ago

@joergFLI I am a little bit confused by the two lines starting // in the paste above, as it also seems to be the last error in the original code block you pasted.

Are you please able to verify that lines 632 and 635 in your local 'main.nf' match those in the 'main' branch of this repo?

Just a reminder that in bash, you comment lines out with # and not // if that is the intent.

joergFLI commented 2 years ago

@thomcuddihy Also wonder why there was a // in lines 632 and 635. To be sure, I used git pull to have the latest original code. Now commands.sh looks fine but the db_split error in NextPolish remains. The file nextPolish.sh.e does not exist this time.

The Problem might be quality of Illumina data. After installing NextPolish v1.3.1 from GIT and running using 3_polishing_long_reads as genome input, it says:

Too many[0.190859] reads contains N base, please do QC first.

When I trim reads using trim-galore both NextPolish and Micropipe work fine. Just wonder why I did not receive the same helpful error message.

thomcuddihy commented 2 years ago

Unfortunately as Nextflow involves multiple layers (container handling, data handling, process code e.g.), it sometimes does not immediately provide the lowest level of error stacks.

Instead one may need to investigate the contents of the process .command.log file for more details (e.g. /home/software/micropipe/work/9c/4e9530b090e3cb82616fbca91e3204/.command.log)