Open Menomens opened 4 years ago
Please re-run with the latest version and let me know if you are still having this problem.
Hi, I am getting exactly the same error with two versions tested 4.0.9 and 4.1.0 on Centos7 NeSI cluster (looking for help from eng support there as well).
error reading mega-reads file at /scale_wlg_persistent/filesets/opt_nesi/CS400_centos7_bdw/MaSuRCA/4.0.9-gimkl-2020a/bin/find_contained_reads.pl line 33, <FILE> line 230791. [Wed Sep 27 19:34:23 UTC 2023] failed to create mega-reads frg file
Haploid 42Mb genome, estimated by Masurca - 53Mb. High coverage >100x Illumina and ONT libraries (used well with Spades and Flye-Pilon)
Any other files/out to provide?
std_out
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Wed Sep 27 18:53:10 UTC 2023] Processing pe library reads
[Wed Sep 27 18:54:51 UTC 2023] Average PE read length 146
[Wed Sep 27 18:54:52 UTC 2023] Using kmer size of 99 for the graph
[Wed Sep 27 18:54:52 UTC 2023] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 469081568, this automatic increase may be not enough!
[Wed Sep 27 18:54:52 UTC 2023] Creating mer database for Quorum
[Wed Sep 27 18:57:23 UTC 2023] Error correct PE
[Wed Sep 27 19:04:24 UTC 2023] Estimating genome size
[Wed Sep 27 19:05:35 UTC 2023] Estimated genome size: 52710869
[Wed Sep 27 19:05:35 UTC 2023] Creating k-unitigs with k=99
[Wed Sep 27 19:09:14 UTC 2023] Computing super reads from PE
[Wed Sep 27 19:16:14 UTC 2023] Using CABOG from /scale_wlg_persistent/filesets/opt_nesi/CS400_centos7_bdw/MaSuRCA/4.0.9-gimkl-2020a/bin
[Wed Sep 27 19:16:14 UTC 2023] Running mega-reads correction/assembly
[Wed Sep 27 19:16:14 UTC 2023] Using mer size 17 for mapping, B=15, d=0.02
[Wed Sep 27 19:16:14 UTC 2023] Estimated Genome Size 52710869
[Wed Sep 27 19:16:14 UTC 2023] Estimated Ploidy 1
[Wed Sep 27 19:16:14 UTC 2023] Using 70 threads
[Wed Sep 27 19:16:14 UTC 2023] Output prefix mr.99.17.15.0.02
[Wed Sep 27 19:16:14 UTC 2023] Creating k-unitigs for k=19
[Wed Sep 27 19:17:32 UTC 2023] Pre-correcting long reads
[Wed Sep 27 19:27:57 UTC 2023] Pre-corrected reads are in longest_reads.25x.fa
[Wed Sep 27 19:27:59 UTC 2023] Computing mega-reads
[Wed Sep 27 19:27:59 UTC 2023] Running locally in 1 batch
[Wed Sep 27 19:30:42 UTC 2023] Refining alignments
[Wed Sep 27 19:31:57 UTC 2023] Computing allowed merges
[Wed Sep 27 19:32:03 UTC 2023] Joining
[Wed Sep 27 19:32:32 UTC 2023] Gap consensus
[Wed Sep 27 19:32:34 UTC 2023] Warning! Some or all gap consensus jobs failed, see files in mr.99.17.15.0.02.join_consensus.tmp, however this is fine and assembly can proceed normally
[Wed Sep 27 19:32:35 UTC 2023] Generating assembly input files
[Wed Sep 27 19:34:23 UTC 2023] mega-reads exited before assembly
config.txt
DATA
PE = pe 500 50 /PATH/R1_001.fastq /PATH/R2_001.fastq
NANOPORE = /PATH/Nanopore/barcode53.fastq
END
PARAMETERS
EXTEND_JUMP_READS=0
GRAPH_KMER_SIZE = auto
USE_LINKING_MATES = 0
USE_GRID=0
GRID_ENGINE=SGE
GRID_QUEUE=all.q
GRID_BATCH_SIZE=500000000
LHE_COVERAGE=25
LIMIT_JUMP_COVERAGE = 300
CA_PARAMETERS = cgwErrorRate=0.15
CLOSE_GAPS=1
NUM_THREADS = 32
JF_SIZE = 200000000
SOAP_ASSEMBLY=0
FLYE_ASSEMBLY=0
END
Hello,
Thank you for reporting this, can you please post output of "ls -lth" command run in the assembly folder?
Thanks, Aleksey
On Fri, Sep 29, 2023 at 2:20 AM MichaelFokinNZ @.***> wrote:
Hi, I am getting exactly the same error with two versions tested 4.0.9 and 4.1.0 on Centos7 NeSI cluster (looking for help from eng support there as well).
error reading mega-reads file at /scale_wlg_persistent/filesets/opt_nesi/CS400_centos7_bdw/MaSuRCA/4.0.9-gimkl-2020a/bin/ find_contained_reads.pl line 33,
line 230791. [Wed Sep 27 19:34:23 UTC 2023] failed to create mega-reads frg file Haploid 42Mb genome, estimated by Masurca - 53Mb. High coverage >100x Illumina and ONT libraries (used well with Spades and Flye-Pilon)
Any other files/out to provide?
std_out
Verifying PATHS... jellyfish OK runCA OK createSuperReadsForDirectory.perl OK creating script file for the actions...done. execute assemble.sh to run assembly [Wed Sep 27 18:53:10 UTC 2023] Processing pe library reads [Wed Sep 27 18:54:51 UTC 2023] Average PE read length 146 [Wed Sep 27 18:54:52 UTC 2023] Using kmer size of 99 for the graph [Wed Sep 27 18:54:52 UTC 2023] MIN_Q_CHAR: 33 WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 469081568, this automatic increase may be not enough! [Wed Sep 27 18:54:52 UTC 2023] Creating mer database for Quorum [Wed Sep 27 18:57:23 UTC 2023] Error correct PE [Wed Sep 27 19:04:24 UTC 2023] Estimating genome size [Wed Sep 27 19:05:35 UTC 2023] Estimated genome size: 52710869 [Wed Sep 27 19:05:35 UTC 2023] Creating k-unitigs with k=99 [Wed Sep 27 19:09:14 UTC 2023] Computing super reads from PE [Wed Sep 27 19:16:14 UTC 2023] Using CABOG from /scale_wlg_persistent/filesets/opt_nesi/CS400_centos7_bdw/MaSuRCA/4.0.9-gimkl-2020a/bin [Wed Sep 27 19:16:14 UTC 2023] Running mega-reads correction/assembly [Wed Sep 27 19:16:14 UTC 2023] Using mer size 17 for mapping, B=15, d=0.02 [Wed Sep 27 19:16:14 UTC 2023] Estimated Genome Size 52710869 [Wed Sep 27 19:16:14 UTC 2023] Estimated Ploidy 1 [Wed Sep 27 19:16:14 UTC 2023] Using 70 threads [Wed Sep 27 19:16:14 UTC 2023] Output prefix mr.99.17.15.0.02 [Wed Sep 27 19:16:14 UTC 2023] Creating k-unitigs for k=19 [Wed Sep 27 19:17:32 UTC 2023] Pre-correcting long reads [Wed Sep 27 19:27:57 UTC 2023] Pre-corrected reads are in longest_reads.25x.fa [Wed Sep 27 19:27:59 UTC 2023] Computing mega-reads [Wed Sep 27 19:27:59 UTC 2023] Running locally in 1 batch [Wed Sep 27 19:30:42 UTC 2023] Refining alignments [Wed Sep 27 19:31:57 UTC 2023] Computing allowed merges [Wed Sep 27 19:32:03 UTC 2023] Joining [Wed Sep 27 19:32:32 UTC 2023] Gap consensus [Wed Sep 27 19:32:34 UTC 2023] Warning! Some or all gap consensus jobs failed, see files in mr.99.17.15.0.02.join_consensus.tmp, however this is fine and assembly can proceed normally [Wed Sep 27 19:32:35 UTC 2023] Generating assembly input files [Wed Sep 27 19:34:23 UTC 2023] mega-reads exited before assembly
config.txt
DATA PE = pe 500 50 /PATH/R1_001.fastq /PATH/R2_001.fastq NANOPORE = /PATH/Nanopore/barcode53.fastq END PARAMETERS EXTEND_JUMP_READS=0 GRAPH_KMER_SIZE = auto USE_LINKING_MATES = 0 USE_GRID=0 GRID_ENGINE=SGE GRID_QUEUE=all.q GRID_BATCH_SIZE=500000000 LHE_COVERAGE=25 LIMIT_JUMP_COVERAGE = 300 CA_PARAMETERS = cgwErrorRate=0.15 CLOSE_GAPS=1 NUM_THREADS = 32 JF_SIZE = 200000000 SOAP_ASSEMBLY=0 FLYE_ASSEMBLY=0 END
— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/158#issuecomment-1740367766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHKH4BJ6JVGQ7RORF6TX4ZSEVANCNFSM4KYKI6GA . You are receiving this because you commented.Message ID: @.***>
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com
please find the content of output folder below. three dirs still there
-rw-rw----+ 1 username project 0 Sep 27 09:32 containees.txt
drwxrws---+ 2 username project 4.0K Sep 27 09:32 work1_mr1
-rw-rw----+ 1 username project 0 Sep 27 09:32 reduce2.out
-rw-rw----+ 1 username project 14K Sep 27 09:31 super1.err
-rw-rw----+ 1 username project 60M Sep 27 09:31 guillaumeKUnitigsAtLeast32bases_all.31.fasta
-rw-rw----+ 1 username project 1.2G Sep 27 09:30 mr.fa.in
-rw-rw----+ 1 username project 1.2G Sep 27 09:29 mr.99.17.15.0.02.1.fa
drwxrws---+ 2 username project 4.0K Sep 27 09:29 mr.99.17.15.0.02.join_consensus.tmp
-rw-rw----+ 1 username project 128M Sep 27 09:29 mr.99.17.15.0.02.1.to_join.fa
-rw-rw----+ 1 username project 1.1G Sep 27 09:29 mr.99.17.15.0.02.1.unjoined.fa
-rw-rw----+ 1 username project 4.1M Sep 27 09:29 mr.99.17.15.0.02.1.allowed
-rw-rw----+ 1 username project 1.6G Sep 27 09:29 mr.99.17.15.0.02.all.txt
-rw-rw----+ 1 username project 1.6G Sep 27 09:26 mr.99.17.15.0.02.txt
-rw-rw----+ 1 username project 29 Sep 27 09:22 create_mega-reads.err
-rw-rw----+ 1 username project 199M Sep 27 09:22 superReadSequences.named.fasta
-rw-rw----+ 1 username project 1.3G Sep 27 09:22 longest_reads.25x.fa
-rw-rw----+ 1 username project 20 Sep 27 08:55 CA_dir.txt
-rw-rw----+ 1 username project 2 Sep 27 08:55 PLOIDY.txt
drwxrws---+ 2 username project 4.0K Sep 27 08:55 work1
lrwxrwxrwx 1 username project 41 Sep 27 08:45 guillaumeKUnitigsAtLeast32bases_all.jump.fasta -> guillaumeKUnitigsAtLeast32bases_all.fasta
-rw-rw----+ 1 username project 168M Sep 27 08:45 guillaumeKUnitigsAtLeast32bases_all.fasta
-rw-rw----+ 1 username project 1.6K Sep 27 08:39 environment.sh
-rw-rw----+ 1 username project 9 Sep 27 08:39 ESTIMATED_GENOME_SIZE.txt
-rw-rw----+ 1 username project 759M Sep 27 08:39 k_u_hash_0
-rw-rw----+ 1 username project 433 Sep 27 08:37 quorum.err
-rw-rw----+ 1 username project 12G Sep 27 08:37 pe.cor.fa
-rw-rw----+ 1 username project 5.2M Sep 27 08:37 pe.cor.tmp.log
-rw-rw----+ 1 username project 2.3G Sep 27 08:20 quorum_mer_db.jf
-rw-rw----+ 1 username project 2.9M Sep 27 08:17 pe_data.tmp
-rw-rw----+ 1 username project 22G Sep 27 08:17 pe.renamed.fastq
-rw-rw----+ 1 username project 10 Sep 27 08:15 meanAndStdevByPrefix.pe.txt
-rwxr-xr-x+ 1 username project 9.1K Sep 27 08:15 assemble.sh
-rwxr-xr-x+ 1 username project 137 Sep 27 08:15 run.sh
-rw-rw----+ 1 username project 639 Sep 27 08:15 config.txt
CPU/RAM usage for the process above
also tried with more resources - was the same result
upd: I tried different dataset (also PE Illumina + Nanopore) - the same error...
I have just downloaded MaSuRCA-4.1.0 release and successfully ran an assembly with Illumina PE and Nanopore (FLYE_ASSEMBLY=0). For ONT+Illumina data sets I recommend to set FLYE_ASSEMBLY=1 in the config file. Please make sure you are running MaSuRCA in a clean environment, and there are no conflicts with bioconda packages or perl libraries on the PATH.
On Sun, Oct 1, 2023 at 10:55 PM MichaelFokinNZ @.***> wrote:
these realter issues? #313 https://github.com/alekseyzimin/masurca/issues/313 #239 https://github.com/alekseyzimin/masurca/issues/239
— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/158#issuecomment-1742336659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHPM3DBQQ3QJ67YDCC3X5IUI3ANCNFSM4KYKI6GA . You are receiving this because you commented.Message ID: @.***>
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com
Most likely you have a failure in the MUMmer perl binding. This is a known conflict with bioconda mummer install. MaSuRCA installs and compiles easily from the distribution tarball, so you can just install it in a local clean environment and run.
On Tue, Oct 3, 2023 at 5:23 PM Aleksey Zimin @.***> wrote:
I have just downloaded MaSuRCA-4.1.0 release and successfully ran an assembly with Illumina PE and Nanopore (FLYE_ASSEMBLY=0). For ONT+Illumina data sets I recommend to set FLYE_ASSEMBLY=1 in the config file. Please make sure you are running MaSuRCA in a clean environment, and there are no conflicts with bioconda packages or perl libraries on the PATH.
On Sun, Oct 1, 2023 at 10:55 PM MichaelFokinNZ @.***> wrote:
these realter issues? #313 https://github.com/alekseyzimin/masurca/issues/313 #239 https://github.com/alekseyzimin/masurca/issues/239
— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/158#issuecomment-1742336659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHPM3DBQQ3QJ67YDCC3X5IUI3ANCNFSM4KYKI6GA . You are receiving this because you commented.Message ID: @.***>
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com
Thank you! I have a gut feeling that it might be related to high coverage (>100x) ONT dataset (both I tried). Can you recommend any available/reproducible SRA dataset (Illumina+ONT) to try?
MaSuRCA should run just fine on up to 150x Illumina PE coverage. I do not recommend using more than that, the assembly will run, but the results will be worse. I have just uploaded a data set for a 37Mbp fungal genome to our anonymous ftp. These data are public, but I do not remember SRA ids: ftp://ftp.ccb.jhu.edu/pub/alekseyz/L.prolificans/lprol.tgz Assembly runs in about 30 minutes on a 24-core server. MaSuRCA 4.1.0 yields ~37Mbp assembly in ~40 contigs with contig N50 of about 2Mbp.
On Tue, Oct 3, 2023 at 7:51 PM MichaelFokinNZ @.***> wrote:
Thank you! I have a gut feeling that it might be related to high coverage (>100x) ONT dataset (both I tried). Can you recommend any available/reproducible SRA dataset (Illumina+ONT) to try?
— Reply to this email directly, view it on GitHub https://github.com/alekseyzimin/masurca/issues/158#issuecomment-1745901667, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGPXGHJIY4HKI33ZBAKUXKTX5SQGVAVCNFSM4KYKI6GKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZUGU4TAMJWGY3Q . You are receiving this because you commented.Message ID: @.***>
-- Dr. Alexey V. Zimin Associate Research Scientist Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA (301)-437-6260 website http://ccb.jhu.edu/people/alekseyz/ blog http://masurca.blogspot.com
Hi Aleksey, with NeSI's (NZ) cluster support help we figured out the cause of this and I run your dataset successfully.
That is perl common issue with ssh sessions opened from Windows clients ( Or some Mac clients) - WSL1 in my case. I got following error
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "C.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
that was fixed permanently by adding following lines to my ~/.bashrc on server side. I assume that can be the part of the run script as well.
# To get rid of the Perl location warning.
export LANGUAGE=en_NZ.UTF-8
export LC_ALL=en_NZ.UTF-8
export LANG=en_NZ.UTF-8
export LC_CTYPE=en_NZ.UTF-8
I see that issues popping out in few threads, so the cause and solution might be useful (likely obvious for perl experts).
Hi, I'm trying to do de novo assembly of a fungi genome that has an average size of 53Mbp. I keep getting this error
This is the sr_config
What can I do? Is it a problem related to the old version or am I wrong something?