chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
270 stars 57 forks source link

Run time error ONT.fastq file #259

Closed khush876 closed 3 years ago

khush876 commented 3 years ago

Hi,

I am trying to assemble a large plant genome using ONT data. I am using Linux based server.

./shasta-Linux-0.7.0 --input /PATH/TO/FILE/nanopore-all-pass.fastq

However, I am getting this error. Coudl please let me know where am I making mistake?

This run uses options "--memoryBacking 4K --memoryMode anonymous". This could result in performance degradation. For full performance, use "--memoryBacking 2M --memoryMode filesystem" (root privilege via sudo required). Therefore the results of this run should not be used for benchmarking purposes. This assembly will use 64 threads. Setting up consensus caller Bayesian:guppy-2.3.5-a Using predefined Bayesian consensus caller guppy-2.3.5-a Bayesian consensus caller configuration name is Human guppy 2.3.5 chr1,chr2,chr3 GM24385 with hg38 priors and 1 pseudocounts 7-23-2019 2021-Jun-20 16:00:10.416604 Begin loading reads from 1 files. 2021-Jun-20 16:00:10.416703 Loading reads from /PATH/TO/FILE/nanopore-all-pass.fastq File size: 23913637394 bytes. Allocate buffer time: 33.8026 s. Read time: 19.5457 s. Read rate: 1.22348e+09 bytes/s. Found 7343152 lines in this file. 2021-Jun-20 16:01:06.801048 A runtime error occurred in thread 5: Extraneous characters on third line for read 88ce31c6-14ee-4617-bf87-49d96ab6e4b3 at offset 1950730179.

Thank you

paoloczi commented 3 years ago

As of the latest release 0.7.0, Shasta required a strict flavor of the fastq format in which the third line for each read consists of just a plus sign and nothing else. However some pipelines create fastq files in which the third line for each read also contains header information. For that reason I recently removed that restriction, and the latest Shasta code on GitHub now permits any number of additional characters, which are ignored, on the third line for each read. This change has not yet been released, but I attached below a current test build of the Shasta Linux version, which contains this change. This should fix your problem, but since this is a test build I recommend upgrading to the next release when one is available.

shasta-2021-06-21.gz

Please try this version and let us know if this fixes the problem.

From your output it looks like you are using mostly default assembly parameters for your assembly. As explained in the documentation, default parameters are not recommended for any particular application and you should instead start with one of the Shasta configuration files provided in the shasta/conf directory. If you are using current ONT reads, I suggest starting with shasta/conf/Nanopore-Sep2020.conf or shasta/conf/Nanopore-Plants-Apr2021.conf (the latter not in release 0.7.0 and available for download from GitHub). You can use option --config to specify a configuration file, and please let me know if you have any questions regarding Shasta configuration files or assembly options.

khush876 commented 3 years ago

Hi,

Thank you for your reply and sharing the version of shasta.

shasta-2021-06-21.gz - I am having trouble installing it on my server. I did gunzip shasta-2021-06-21.gz but it does not seem to be executable file like shasta-Linux-0.7.0?

Indeed I tried with default assembly parameters to check if shasta works. However, I will adjust the parameters according to shasta/conf/Nanopore-Sep2020.conf.

How can I add shasta/conf/Nanopore-Sep2020.conf file along with my input .fastq file? Please can you suggest. I am new to Shasta.

./shasta-2021-06-21 --config /PATH/TO/FILE/Nanopore-Sep2020.conf --input /PATH/TO/FILE/nanopore-all-pass.fastq

Much appreciate!!!

paoloczi commented 3 years ago

shasta-2021-06-21.gz - I am having trouble installing it on my server. I did gunzip shasta-2021-06-21.gz but it does not seem to be executable file like shasta-Linux-0.7.0?

The file was executable when I compressed it, but something must have changed along the way, perhaps as a security measure in GitHub. If the file as it reached you is not executable, you can make it executable using the Linux chmod command, like this:

chmod ugo+x shasta-2021-06-21

If, after doing this, you still can't run, please attach the output of the following two commands:

ls -l shasta-2021-06-21
file shasta-2021-06-21

How can I add shasta/conf/Nanopore-Sep2020.conf file along with my input .fastq file? Please can you suggest. I am new to Shasta.

./shasta-2021-06-21 --config /PATH/TO/FILE/Nanopore-Sep2020.conf --input /PATH/TO/FILE/nanopore-all-pass.fastq

That is the right way to do it. If you don't have the configuration file, you can download it from GitHub (in shasta/conf), but for convenience I also attached here the two configuration files which are probably your best bets:

Nanopore-Sep2020.conf.gz Nanopore-Plants-Apr2021.conf.gz

Let me know if you still have problems starting your assembly. And if you don't get a satisfactory assembly please post here AssemblySummary.html from the assembly directory, and I may be able to give suggestions.

khush876 commented 3 years ago

Hi,

It failed

./shasta-2021-06-21 --config Nanopore-Plants-Apr2021.conf --input nanopore-all-pass.fastq

error

ls -l shasta-2021-06-21 -rwxr-xr-x 1 xx yy 9013224 Jun 23 22:29 shasta-2021-06-21

file shasta-2021-06-21 shasta-2021-06-21: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=354427d9d3975c6b40b957ad64176740c5fdbaef, for GNU/Linux 3.2.0, stripped

ShastaRun - folder has been created and it contains - shasta.conf file inside.

This run uses options "--memoryBacking 4K --memoryMode anonymous". This could result in performance degradation. For full performance, use "--memoryBacking 2M --memoryMode filesystem" (root privilege via sudo required). Therefore the results of this run should not be used for benchmarking purposes. This assembly will use 64 threads. Setting up consensus caller Bayesian:guppy-3.6.0-a Using predefined Bayesian consensus caller guppy-3.6.0-a Bayesian consensus caller configuration name is guppy_360_hg002_chr1 with pseudocounts 1 2021-Jun-23 22:34:48.298498 Begin loading reads from 1 files. 2021-Jun-23 22:34:48.298611 Loading reads from nanopore-all-pass.fastq 2021-Jun-23 22:35:09.943584 Terminated after catching a runtime error exception:

paoloczi commented 3 years ago

There should be one or two additional lines of output after that last message you posted. Those lines would contain the actual description of the error that caused termination. Can you attach those too?

Also please attach the following information:

  1. The size of the fastq file (e. g. use ls -l nanopore-all-pass.fastq).
  2. The amount of memory your machine has (for example using command tail -1 /proc/meminfo).
  3. The filesystem type the fastq file resides on (e. g. use command df nanopore-all-pass.fastq).
paoloczi commented 3 years ago

Actually, for filesystem information (item 3 above) I will need the output of df -T nanopore-all-pass.fastq. The -T option is necessary to get the filesystem type.

paoloczi commented 3 years ago

I am closing this due to lack of discussion. Feel free to reopen it or create a new one if additional questions or discussion topics emerge.