biocorecrg / MOP2

Master of Pores 2
https://biocorecrg.github.io/MOP2/docs/
MIT License
23 stars 7 forks source link

Would like to install in M1chip Mac #30

Closed AndreaYCT closed 1 year ago

AndreaYCT commented 2 years ago

I was trying to install MOP2 in a M1-Mac and realized that INSTALL.sh (for installing Guppy) failed. I am very new in using command to do the bioinformatic tool. Does any one can suggest me how to make the installation of Guppy work in mac?

I've tried download the lasted Guppy from Nanopore, move to mop_preprocess/bin, and execute similarly. cd MOP2/mop_preprocess/bin ln -s ont-guppy-cpu/bin/guppy_* . nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac > log

It showed zsh: command not found: nextflow However, it does not work.

Would like to learn very much!

Andrea

lucacozzuto commented 2 years ago

In which way it failed? If you do sh INSTALL.sh what do you get?

L

AndreaYCT commented 2 years ago

In which way it failed? If you do sh INSTALL.sh what do you get?

L

Hi, I've updated some knowledge to work with Nextflow and Docker in these days. And, I tried to do the below.

Mac terminal: Install Nextflow and move to /bin. Install DockerDesktop. (Work well)

Created a Dockerfile: FROM ubuntu RUN apt-get update && apt-get install --yes --no-install-recommends \ wget \ curl \ git \ python3 \ default-jre

Install Nextflow

RUN curl -s https://get.nextflow.io | bash

Install Master of pore 2

RUN git clone --depth 1 --recurse-submodules https://github.com/biocorecrg/MOP2.git

ENV PATH=$PATH:~

In mac terminal: docker build -t my-image . Work just fine

Get in the "my-image" container docker run -it my-image Run bash INSTALL.sh 6.3.8 Then it showed an error "unlink: cannot unlink 'mop_preprocess/bin/ont-guppy/lib/libz.so': No such file or directory".

I am not sure if that it supposed to work with master-of-pore-2. In my imagination, I can create a container with Ubuntu images and install MOP2 and Python and R and ect. Then run in the Mac terminal by Nextflow and -with docker. Did I get it right?

Thanks for reply!

lucacozzuto commented 2 years ago

Hi, no you don't need to make another docker container... they will be pulled down from docker hub and quay.io when needed. So you just need to install nextflow, download MOP2, do sh INSTALL.sh etc (the error is normal, some guppy versions can have or not that file). Now you can just run the test.

Luca

AndreaYCT commented 2 years ago

Hi, Luca,

Thanks for the reply.

I started over as (1) install nextflow (need to take care of java version carefully as a lesson) and successfully ran "nextflow run hello" (2) install MOP2 (3) execute bash INSTALL.sh 6.3.8 It then showed "unlink: mop_preprocess/bin/ont-guppy/lib/libz.so: No such file or directory" So I ignore it and run (4)nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac > log

There is error msg in the log. N E X T F L O W ~ version 22.04.5 Launching mop_preprocess.nf [tiny_woese] DSL2 - revision: ec40fe0af4

╔╦╗╔═╗╔═╗ ╔═╗┬─┐┌─┐┌─┐┬─┐┌─┐┌─┐┌─┐┌─┐┌─┐ ║║║║ ║╠═╝ ╠═╝├┬┘├┤ ├─┘├┬┘│ ││ ├┤ └─┐└─┐ ╩ ╩╚═╝╩ ╩ ┴└─└─┘┴ ┴└─└─┘└─┘└─┘└─┘└─┘

==================================================== BIOCORE@CRG Master of Pores 2. Preprocessing - N F ~ version 2.0

conffile. : final_summary_01.txt

fast5 : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../data/*/.fast5 fastq :

reference : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../anno/yeast_rRNA_ref.fa.gz annotation :

granularity. : 1

ref_type : transcriptome pars_tools : drna_tool_splice_opt.tsv

output : /Users/andreayuan-chiteng/MOP2/mop_preprocess/output

GPU : OFF

basecalling : guppy demultiplexing : NO demulti_fast5 : NO

filtering : nanoq mapping : graphmap

counting : nanocount discovery : NO

cram_conv : YES subsampling_cram : 50

saveSpace : NO email : lucacozzuto@crg.es

Sending the email to lucacozzuto@crg.es

----------------------CHECK TOOLS ----------------------------- basecalling : guppy

demultiplexing will be skipped mapping : graphmap filtering : nanoq counting : nanocount discovery will be skipped

[8e/bfdc26] Submitted process > flow1:GUPPY_BASECALL:baseCall (mod---1) [ef/1ccc81] Submitted process > flow1:GUPPY_BASECALL:baseCall (wt---2) [59/b8cde8] Submitted process > preprocess_flow:checkRef (Checking yeast_rRNA_ref.fa.gz) Error executing process > 'flow1:GUPPY_BASECALL:baseCall (mod---1)'

Caused by: Process flow1:GUPPY_BASECALL:baseCall (mod---1) terminated with an error exit status (127)

Command executed:

guppy_basecaller --fast5_out --flowcell FLO-MIN106 --kit SQK-RNA002 -i ./ --save_path ./mod---1_out --gpu_runners_per_device 1 --cpu_threads_per_caller 1 --num_callers 1 cat mod---1_out/.fastq >> mod---1.fastq rm mod---1_out/.fastq gzip mod---1.fastq

Command exit status: 127

Command output: (empty)

Command error: .command.run: line 95: docker: command not found

Work dir: /Users/andreayuan-chiteng/MOP2/mop_preprocess/work/8e/bfdc26717cdf39b6164668f35f9257

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Pipeline BIOCORE@CRG Master of Pore - preprocess completed! Started at 2022-10-07T15:42:37.151667+08:00 Finished at 2022-10-07T15:42:42.770361+08:00 Time elapsed: 5.6s Execution status: failed Failed to invoke workflow.onComplete event handler

-- Check script 'mop_preprocess.nf' at line: 632 or see '.nextflow.log' file for more details

I guess there is something still not correct with Guppy, right?

Thank you in advance!

Andrea

Hi, no you don't need to make another docker container... they will be pulled down from docker hub and quay.io when needed. So you just need to install nextflow, download MOP2, do sh INSTALL.sh etc (the error is normal, some guppy versions can have or not that file). Now you can just run the test.

Luca

lucacozzuto commented 2 years ago

Hi, so it looks like docker is not running. You need to download it on your mac and run it. Likely you will need to register to dockerhub in mac if I remember well.

.command.run: line 95: docker: command not found
AndreaYCT commented 2 years ago

Hi,

Thanks for the tip. My bad. I did not notice docker was not running.

However, I ran again and got this: Error executing process > 'flow1:GUPPY_BASECALL:baseCall (mod---1)'

Caused by: Process flow1:GUPPY_BASECALL:baseCall (mod---1) terminated with an error exit status (125)

Command executed:

guppy_basecaller --fast5_out --flowcell FLO-MIN106 --kit SQK-RNA002 -i ./ --save_path ./mod---1_out --gpu_runners_per_device 1 --cpu_threads_per_caller 1 --num_callers 8 cat mod---1_out/.fastq >> mod---1.fastq rm mod---1_out/.fastq gzip mod---1.fastq

Command exit status: 125

Command output: (empty)

Command error: docker: Error response from daemon: Range of CPUs is from 0.01 to 4.00, as there are only 4 CPUs available. See 'docker run --help'.

Work dir: /Users/andreayuan-chiteng/MOP2/mop_preprocess/work/a7/d7f7df2dae048266d6fb5960dfc535

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Many thanks!

Andrea

lucacozzuto commented 2 years ago

Hi, you are asking for 8 CPUs... while you have a maximum of 4 CPUs... Where are you putting this info? I don't have it anywhere. Did you change something?

L

AndreaYCT commented 2 years ago

Hi,

After you reminded me that the docker was not running, I installed docker desktop and let it ran in the background (with a submarine on the menu bar). My input was @Andreas-iMac mop_preprocess % nextflow run mop_preprocess.nf -with-docker It gave the result as below: N E X T F L O W ~ version 22.04.5 Launching mop_preprocess.nf [amazing_jepsen] DSL2 - revision: ec40fe0af4

╔╦╗╔═╗╔═╗ ╔═╗┬─┐┌─┐┌─┐┬─┐┌─┐┌─┐┌─┐┌─┐┌─┐ ║║║║ ║╠═╝ ╠═╝├┬┘├┤ ├─┘├┬┘│ ││ ├┤ └─┐└─┐ ╩ ╩╚═╝╩ ╩ ┴└─└─┘┴ ┴└─└─┘└─┘└─┘└─┘└─┘

==================================================== BIOCORE@CRG Master of Pores 2. Preprocessing - N F ~ version 2.0

conffile. : final_summary_01.txt

fast5 : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../data/*/.fast5 fastq :

reference : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../anno/yeast_rRNA_ref.fa.gz annotation :

granularity. : 1

ref_type : transcriptome pars_tools : drna_tool_splice_opt.tsv

output : /Users/andreayuan-chiteng/MOP2/mop_preprocess/output

GPU : OFF

basecalling : guppy demultiplexing : NO demulti_fast5 : NO

filtering : nanoq mapping : graphmap

counting : nanocount discovery : NO

cram_conv : YES subsampling_cram : 50

saveSpace : NO email : lucacozzuto@crg.es

Sending the email to lucacozzuto@crg.es

----------------------CHECK TOOLS ----------------------------- basecalling : guppy

demultiplexing will be skipped mapping : graphmap filtering : nanoq counting : nanocount discovery will be skipped

executor > local (3) [7d/a0a5f5] process > flow1:GUPPY_BASECALL:baseCall (wt---2) [ 0%] 0 of 2 [- ] process > flow1:NANOQ_FILTER:filter - [- ] process > preprocess_flow:MinIONQC - [- ] process > preprocess_flow:RNA2DNA - [- ] process > preprocess_flow:GRAPHMAP:map - [- ] process > preprocess_flow:SAMTOOLS_CAT:catAln - [- ] process > preprocess_flow:SAMTOOLS_SORT:sortAln - [- ] process > preprocess_flow:SAMTOOLS_INDEX:indexBam - [39/0f29de] process > preprocess_flow:checkRef (Checking yeast_rRNA_ref.fa.gz) [100%] 1 of 1 ✔ executor > local (3) [7d/a0a5f5] process > flow1:GUPPY_BASECALL:baseCall (wt---2) [100%] 1 of 1, failed: 1 [- ] process > flow1:NANOQ_FILTER:filter - [- ] process > preprocess_flow:MinIONQC - [- ] process > preprocess_flow:RNA2DNA - [- ] process > preprocess_flow:GRAPHMAP:map - [- ] process > preprocess_flow:SAMTOOLS_CAT:catAln - [- ] process > preprocess_flow:SAMTOOLS_SORT:sortAln - [- ] process > preprocess_flow:SAMTOOLS_INDEX:indexBam - [39/0f29de] process > preprocess_flow:checkRef (Checking yeast_rRNA_ref.fa.gz) [100%] 1 of 1 ✔ [- ] process > preprocess_flow:bam2Cram - [- ] process > preprocess_flow:bam2stats - [- ] process > preprocess_flow:joinAlnStats - [- ] process > preprocess_flow:NANOPLOT_QC:MOP_nanoPlot - [- ] process > preprocess_flow:concatenateFastQFiles - [- ] process > preprocess_flow:FASTQC:fastQC - [- ] process > preprocess_flow:NANOCOUNT:nanoCount - [- ] process > preprocess_flow:AssignReads - [- ] process > preprocess_flow:countStats - [- ] process > preprocess_flow:joinCountStats - [- ] process > preprocess_flow:MULTIQC:makeReport [ 0%] 0 of 1 Error executing process > 'flow1:GUPPY_BASECALL:baseCall (mod---1)'

Caused by: Process flow1:GUPPY_BASECALL:baseCall (mod---1) terminated with an error exit status (125)

Command executed:

guppy_basecaller --fast5_out --flowcell FLO-MIN106 --kit SQK-RNA002 -i ./ --save_path ./mod---1_out --gpu_runners_per_device 1 --cpu_threads_per_caller 1 --num_callers 8 cat mod---1_out/.fastq >> mod---1.fastq rm mod---1_out/.fastq gzip mod---1.fastq

Command exit status: 125

Command output: (empty)

Command error: docker: Error response from daemon: Range of CPUs is from 0.01 to 4.00, as there are only 4 CPUs available. See 'docker run --help'.

Work dir: /Users/andreayuan-chiteng/MOP2/mop_preprocess/work/a7/d7f7df2dae048266d6fb5960dfc535

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

I also have no knowledge to change to use indicated number of CPU. Where to know that I was asking for 8 CUPs?

Many thanks!

Andrea

Hi, you are asking for 8 CPUs... while you have a maximum of 4 CPUs... Where are you putting this info? I don't have it anywhere. Did you change something?

L

lucacozzuto commented 2 years ago

Hi, no... you should use a custom profile


nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac > log

Luca

AndreaYCT commented 2 years ago

Hi,

just tried and it generate a "log" file (in "mop_preprocess" folder), correct?

Inside the log, it says:

N E X T F L O W ~ version 22.04.5 Launching mop_preprocess.nf [marvelous_euler] DSL2 - revision: ec40fe0af4

╔╦╗╔═╗╔═╗ ╔═╗┬─┐┌─┐┌─┐┬─┐┌─┐┌─┐┌─┐┌─┐┌─┐ ║║║║ ║╠═╝ ╠═╝├┬┘├┤ ├─┘├┬┘│ ││ ├┤ └─┐└─┐ ╩ ╩╚═╝╩ ╩ ┴└─└─┘┴ ┴└─└─┘└─┘└─┘└─┘└─┘

==================================================== BIOCORE@CRG Master of Pores 2. Preprocessing - N F ~ version 2.0

conffile. : final_summary_01.txt

fast5 : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../data/*/.fast5 fastq :

reference : /Users/andreayuan-chiteng/MOP2/mop_preprocess/../anno/yeast_rRNA_ref.fa.gz annotation :

granularity. : 1

ref_type : transcriptome pars_tools : drna_tool_splice_opt.tsv

output : /Users/andreayuan-chiteng/MOP2/mop_preprocess/output

GPU : OFF

basecalling : guppy demultiplexing : NO demulti_fast5 : NO

filtering : nanoq mapping : graphmap

counting : nanocount discovery : NO

cram_conv : YES subsampling_cram : 50

saveSpace : NO email : lucacozzuto@crg.es

Sending the email to lucacozzuto@crg.es

----------------------CHECK TOOLS ----------------------------- basecalling : guppy

demultiplexing will be skipped mapping : graphmap filtering : nanoq counting : nanocount discovery will be skipped

[11/de5f40] Submitted process > preprocess_flow:checkRef (Checking yeast_rRNA_ref.fa.gz) [85/ec35ff] Submitted process > flow1:GUPPY_BASECALL:baseCall (mod---1) [8f/b03935] Submitted process > flow1:GUPPY_BASECALL:baseCall (wt---2)

And I am expecting a folder named "output" will be there too if I successfully ran it?

nextflow run mop_preprocess.nf -with-docker -bg -profile m1mac > log

lucacozzuto commented 2 years ago

well it is still ongoing you need to wait :)

AndreaYCT commented 2 years ago

well it is still ongoing you need to wait :)

Thank you so much!!! I will wait and see! How long dose it usually take?

Andrea

AndreaYCT commented 2 years ago

Hi, Luca,

It was finished. I opened the log.txt. It showed: Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out  

Then I did cd /Users/andreayuan-chiteng/MOP2/mop_preprocess/work/0c/a941b388fcba78bf2cb9890e019d75 and cat .command.out

This msg showed: CRASHPAD MESSAGE: ONT Guppy basecalling software version 6.3.8+d9e0f64, minimap2 version 2.22-r1101 config file: /Users/andreayuan-chiteng/MOP2/mop_preprocess/bin/ont-guppy/data/rna_r9.4.1_70bps_hac.cfg model file: /Users/andreayuan-chiteng/MOP2/mop_preprocess/bin/ont-guppy/data/template_rna_r9.4.1_70bps_hac.jsn input path: ./ save path: ./wt---2_out chunk size: 2000 chunks per runner: 512 minimum qscore: 7 records per file: 4000 num basecallers: 1 cpu mode: ON threads per caller: 1

Use of this software is permitted solely under the terms of the end user license agreement (EULA).By running, copying or accessing this software, you are demonstrating your acceptance of the EULA. The EULA may be found in /Users/andreayuan-chiteng/MOP2/mop_preprocess/bin/ont-guppy/bin Warning: fast5_out is deprecated - emitting fast5 files from guppy will be removed in a future version Found 1 input read file to process. Init time: 1504 ms

0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----|


Caller time: 108347366 ms, Samples called: 28004665, samples/s: 258.471 Finishing up any open output files. Basecalling completed successfully.

My questions are (1) I suppose to have one WT and one KO file, although the log just showed one? (2) some error msg such as "Failed to invoke workflow.onComplete event handler" can be ignored? (3) will I have a new folder named "output" under mop_preprocess?

Many thanks for helping me!

Andrea

well it is still ongoing you need to wait :)

log.txt

lucacozzuto commented 2 years ago

You are useing guppy 6. So you have to specify drna_tool_unsplice_guppy6_opt.tsv as --pars_tools as specified in the documentation. https://biocorecrg.github.io/MOP2/docs/mop_preprocess.html

AndreaYCT commented 2 years ago

You are useing guppy 6. So you have to specify drna_tool_unsplice_guppy6_opt.tsv as --pars_tools as specified in the documentation. https://biocorecrg.github.io/MOP2/docs/mop_preprocess.html

I see and I will try again.

Following this thread, you mentioned "Newer versions of guppy automatically separate the reads depending on the quality. You need to disable this via custom options for being used in MoP3. This is also to avoid losing interesting signals since the modified bases have often low qualities. GUPPY 6 seems to require singularity 3.7.0 or higher."

I would like to know would you recommend me to use older (such like GUPPY 4) version or stay with Guppy 6 if I am going to run mop_mod later?

Thank you!

lucacozzuto commented 2 years ago

Is fine to use newer versions!

lucacozzuto commented 2 years ago

You have a strange error with docker...

  docker: unauthorized: authentication required.

This is the first time I see it. Googling it a bit and it could be the host clock... You can check it here:

https://github.com/docker/hub-feedback/issues/645#issuecomment-536198753

About the K I think is just some problem with the rendering but is ok

Luca

On 17/10/2022 02:14, AndreaYCT wrote:

Here is the log of my second run. This time I have "output" folder and fastq files (~480 KB). I found there are some error msg still, not sure if it can be ignored. Addition to this, there are many "k" in the log.text. What does that mean?

Thank you of the help! 202210148AM.log.txt https://github.com/biocorecrg/MOP2/files/9796004/202210148AM.log.txt

You are useing guppy 6. So you have to specify
drna_tool_unsplice_guppy6_opt.tsv as *--pars_tools* as specified
in the documentation.
https://biocorecrg.github.io/MOP2/docs/mop_preprocess.html

— Reply to this email directly, view it on GitHub https://github.com/biocorecrg/MOP2/issues/30#issuecomment-1280104220, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZ5FPPG6XXJDZ4PC56H44DWDSK63ANCNFSM6AAAAAAQZPHZMA. You are receiving this because you were assigned.Message ID: @.***>

AndreaYCT commented 1 year ago

@lucacozzuto

Hi,

Firstly, thank you for helping me running MOP2 on my Mac. However, I realized what I should do is to run MOP2 on HPC so I spent some time to figure out how to do this.

I ran the example on HPC service and did get the output files. There is only one error msg with nano plot. Would you please take a look at the log.txt?

Can I continue to run the MOP_mod?

Thank you!!

MOP2_20221103output.txt

lucacozzuto commented 1 year ago

Hi. the output is empty. However I found that sometime nanoplot fails. Don't worry about it

L

AndreaYCT commented 1 year ago

oops~ this is right log.txt. log20221103.txt

Hi. the output is empty. However I found that sometime nanoplot fails. Don't worry about it

L

lucacozzuto commented 1 year ago

Yes, I think at certain point I would need some replacement for this tool