Closed agnibhat closed 3 years ago
Please post the output of the following command:
ls -l /Users/akv4001/Desktop/Seqdata/Shasta/nf54-816-h6merge.fastq
-rw-r--r--@ 1 akv4001 staff 2791787283 Dec 18 17:50 /Users/akv4001/Desktop/Seqdata/Shasta/nf54-816-h6merge.fastq
It is possible that the macOS version of Shasta has a 2 GB limit on the size of a file it can read. I suggest converting the fastq file to fasta, which will approximately reduce its size by a factor of two, or split it into two files, each less than 2 GB.
The Linux version has no such limitation and is able to process files that are hundreds of GB in size.
How much memory does your Mac have? It is likely that you will need at least 12 to 16 GB to run this assembly (this is a separate issue from the problem you are seeing now).
Thanks! I will try it. My Mac has 8 GB memory. So it might be a problem. I will try to run it on a Linux machine and see if it works. How much time it might take to assemble ~24 Mb genome with 16 cores and 64 GB RAM? I might have access to this system in the near future.
That assembly should take just a few minutes on a machine like you described.
For best results and assuming you have recent nanopore data, make sure to use config file shasta/conf/Nanopore-Sep2020.conf
. Use command line option --config
to specify the configuration file, and you can download the file from the Github repository, or get it from the tar file for the current release.
I am closing this due to lack of discussion, but feel free to reopen it or create a new issue if more information emerges.
I have the same issue on linux with 500 GB of RAM available. The reads were merged in to one fastq file of 68 GB...
Please provide the following information:
fastq
file resides on. If you are on Linux, you can get that via stat -f fileName.fastq
, making sure to fill in the path to your fastq
file.If you are using option --Reads.noCache
(you might be getting that through a configuration file), try removing it and see if the problem still occurs.
If you provide the above information, I may be able to give suggestions.
Thanks for the quick reply
This is the version: "CentOS Linux 7 (Core)"
Got this from stat -f ID: ef0009600000002 Namelen: 255 Type: gpfs Block size: 16777216 Fundamental block size: 16777216 Blocks: Total: 76021760 Free: 23653323 Available: 23653323 Inodes: Total: 402653184 Free: 217161914
I am running now by splitting the file in <2GB files, seems to work now Will try to remove this in the next run --Reads.noCache
There is a known issue #202 when using --Reads.noCache
on the gpfs
filesystem. Splitting the file will not help. It should work if you remove --Reads.noCache
, altough this might cause some reduction in assembly performance, depending on your machine configuration.
Hopefully this will be fixed in the next release. On read failure, we should automatically turn off --Reads.noCache
and retry.
Ok will try later, but it is still running after splitting in to small files. Currently "computing marker graph vertices"
Oh cool. We have seen strange things happen with gpfs
, so this adds to the list. For the future, if your data are on gpfs
I suggest just taking out --Reads.noCache
, without having to worry about splitting the file.
Ok will do for the next run, thanks for the help and the great tool!
Hi,
I am trying to assemble my nanopore sequencing reads using Shasta. I am encountering an error which in principle says what I have mentioned in the subject above. I am attaching a log file for your reference. Kind of lost in trying to figure out where the error is coming from. Any help is much appreciated.
I am using a MacBook for now with very basic resources. The assembly genome is a considerably small one with ~24Mb
P.S - I am a newbie at bioinformatic analysis. shasta.log