CDCgov / phoenix

🔥🐦🔥PHoeNIx: A short-read pipeline for healthcare-associated and antimicrobial resistant pathogens
Apache License 2.0
52 stars 19 forks source link

Terminated with an error exit status 127 #78

Closed Siddra2021 closed 1 year ago

Siddra2021 commented 1 year ago

Describe the bug Testing Phoenix install using the following command

nextflow run ~/phoenix/main.nf -profile test, docker -entry PHOENIX --kraken2db ~/phoenix/assets/databases

Impact

It ends with the following error: Caused by: Process PHOENIX:PHOENIX_EXTERNAL:AMRFINDERPLUS_UPDATE (update) terminated with an error exit status (127) Please see the screenshot

To Reproduce local linux environment

Phoenix version 1.0.0 nextflow run ~/phoenix/main.nf -profile test, docker -entry PHOENIX --kraken2db ~/phoenix/assets/databases Using the files provided with the installation directions

Expected behavior The pipeline should run successfully

Screenshots Attached

jvhagey commented 1 year ago

@Siddra2021 remember to crop your screenshots to not show the user@computer line and block out the full paths. To issues I see with the run command are that you need to remove the space between "test" and "docker" so it needs to be -profile test,docker NOT -profile test, docker. Next make sure you have the trailing / at the end of the --kraken2db argument so it should be --kraken2db ~/phoenix/assets/databases/

Siddra2021 commented 1 year ago

Thank you for your prompt response. Fixed the issues in command line and now have the following error. Please see the screenshot. Screenshot (14)

jvhagey commented 1 year ago

@Siddra2021 unfortunately this is also a memory issue with docker and not the phoenix pipeline itself (google docker exit code 137). My first guess here is that the container is using more memory that is available so it gets killed. By default, Docker containers use the available memory in the host machine. You can prevent a single container from abusing the host resources, by setting memory limits per container. I am not sure how your docker is set up (desktop on PC or CLI?), but you are going to have to work with your IT to get your docker configured to keep the containers from using up all the memory. For kraken you need 10GB of memory and you can point your IT to the file linked to see specific memory requirements for each step.

Siddra2021 commented 1 year ago

Thanks for the feedback. I am using docker desktop and have allocated 15 GB to docker. Any advice on limiting the containers to use all the memory?

Siddra2021 commented 1 year ago

I have 24 GB total memory and allocated 15 GB, swap 6 and put the container limit to 1G. Now it is terminating with the error. I do not think it is permission issue as I am able to see some of the folders created after this application is terminated. Can you please advise. Thanks. Screenshot (15)

Siddra2021 commented 1 year ago

This seems to be a memory issue but I have more than 6GB allocated for docker. Any suggestion/feedback on this?

jvhagey commented 1 year ago

@Siddra2021, the kraken db itself is ~8GB which is why I suggested that step be given at least 10GB. My other concern here is that PROKKA will take ~40GB to run so I am a bit worried that you might fix kraken and then run into more memory problems with PROKKA. Is this being run on a laptop?

slsevilla commented 1 year ago

@Siddra2021 - I was having issues with memory on Kraken2 as well, and added --memory-mapping to modules/local/kraken2.nf. That cleared the issue with me; thought it might help you too.

Siddra2021 commented 1 year ago

Thank you for response. @slsevilla , I will look into this. @jvhagey , yes, I am using a laptop with total RAM of 23 GB. Do you think I have enough memory to run this pipeline on my laptop? I thought we needed minimum of 6GB.

jvhagey commented 1 year ago

Oh yes, good idea @slsevilla. @Siddra2021 you can pass --memory-mapping to the kraken steps using the a config file with the following.

process {
    withName: 'KRAKEN2_TRIMD|KRAKEN2_ASMBLD' {
        ext.args = '--use-names --memory-mapping'
        publishDir = [
            path: { "${params.outdir}/${meta.id}/kraken2_trimd" },
            mode: 'copy',
            pattern: "*{classified*,unclassified*,classifiedreads*,report.txt}"
        ]
    }
    withName: 'KRAKEN2_WTASMBLD' {
        ext.args = '--memory-mapping'
        publishDir = [
            path: { "${params.outdir}/${meta.id}/kraken2_trimd" },
            mode: 'copy',
            pattern: "*{classified*,unclassified*,classifiedreads*,report.txt}"
        ]
    }
}

So just to be clear add -c kraken.config (assuming you named the file "kraken.config") to your run command. It worked fine on my end, let me know if this works for you. Just for awareness @Siddra2021 when you use memory mapping it won't write the database to RAM, but rather OS cache. It is my understanding that there are at least two consequences of doing this (per other github threads).

  1. Disk I/O (reading and writing data) is slower than loading it into the RAM so doing this will increase the time the pipeline takes to run. However, if you are limited by the RAM it might just be something you have to do.
  2. Writing to the cache will take some time so expect the first run of the day to be slower than the rest. That is if you reboot it will have to rewrite again, but once its in the cache the next samples should be quicker.

In terms of enough memory in general, we typically use a bigger laptop with 12CPU/2 threads per, 2.8Ghz, 64GBs RAM. I also ran it on my smaller 16GB 4CPU laptop, but its pretty slow (~50mins for the small test sample). So my hope is that if you get past the kraken issue it should run, but its just going to be slow. Let me know how it goes for you.

Siddra2021 commented 1 year ago

Jill, Thank you again. I will pass the memory mapping as you suggested. However, where do I find the config file to add the script you have given? Do I need to generate one? I am sorry, I am new to this and this is a learning experience for me. Siddra

On Tue, Jan 10, 2023 at 3:34 PM Jill V. Hagey, PhD @.***> wrote:

Oh yes, good idea @slsevilla https://github.com/slsevilla. @Siddra2021 https://github.com/Siddra2021 you can pass --memory-mapping to the kraken steps using the a config file with the following.

process { withName: 'KRAKEN2_TRIMD|KRAKEN2_ASMBLD' { ext.args = '--use-names --memory-mapping' publishDir = [ path: { "${params.outdir}/${meta.id}/kraken2_trimd" }, mode: 'copy', pattern: "{classified,unclassified,classifiedreads,report.txt}" ] } withName: 'KRAKEN2_WTASMBLD' { ext.args = '--memory-mapping' publishDir = [ path: { "${params.outdir}/${meta.id}/kraken2_trimd" }, mode: 'copy', pattern: "{classified,unclassified,classifiedreads,report.txt}" ] } }

So just to be clear add -c kraken.config (assuming you named the file "kraken.config") to your run command. It worked fine on my end, let me know if this works for you. Just for awareness @Siddra2021 https://github.com/Siddra2021 when you use memory mapping it won't write the database to RAM, but rather OS cache. It is my understanding that there are at least two consequences of doing this (per other github threads).

  1. Disk I/O (reading and writing data) is slower than loading it into the RAM so doing this will increase the time the pipeline takes to run. However, if you are limited by the RAM it might just be something you have to do.
  2. Writing to the cache will take some time so expect the first run of the day to be slower than the rest. That is if you reboot it will have to rewrite again, but once its in the cache the next samples should be quicker.

In terms of enough memory in general, we typically use a bigger laptop with 12CPU/2 threads per, 2.8Ghz, 64GBs RAM. I also ran it on my smaller 16GB 4CPU laptop, but its pretty slow (~50mins for the small test sample). So my hope is that if you get past the kraken issue it should run, but its just going to be slow. Let me know how it goes for you.

— Reply to this email directly, view it on GitHub https://github.com/CDCgov/phoenix/issues/78#issuecomment-1377813895, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVET7RMU6FHJ2HTBBUOAADWRXBURANCNFSM6AAAAAATP5SSRQ . You are receiving this because you were mentioned.Message ID: @.***>

jvhagey commented 1 year ago

@Siddra2021, you need to create a file, copy the code from my last comment, save it as "kraken.config" and then pass it as -c karaken.config to PHoeNIx. So the full command would be nextflow run ~/phoenix/main.nf -profile test,docker -entry PHOENIX --kraken2db ~/phoenix/assets/databases/ -c kraken.config

Siddra2021 commented 1 year ago

Jill, Thanks again for your detailed message. I created a txt file and saved it in the folder phoenix. The pipeline ran longer than before but terminated again with the same error. "Caused by: Unable to create folder=/home/sidra/phoenix/work/3e/2c32e91fa62072f5d79ec6fb3c6e50 -- check file system permission" My docker has 11GB out of 23 GB. Would appreciate any help.

Siddra

On Wed, Jan 11, 2023 at 2:31 PM Jill V. Hagey, PhD @.***> wrote:

@Siddra2021 https://github.com/Siddra2021, you need to create a file, copy the code from my last comment, save it as "kraken.config" and then pass it as -c karaken.config to PHoeNIx. So the full command would be nextflow run ~/phoenix/main.nf -profile test,docker -entry PHOENIX --kraken2db ~/phoenix/assets/databases/ -c kraken.config

— Reply to this email directly, view it on GitHub https://github.com/CDCgov/phoenix/issues/78#issuecomment-1379382748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVET7XO6Q5X5TXM7UQMD7DWR4DA5ANCNFSM6AAAAAATP5SSRQ . You are receiving this because you were mentioned.Message ID: @.***>

Siddra2021 commented 1 year ago

Got it working, thanks @slsevilla and @jvhagey !

jvhagey commented 1 year ago

As a final comment, after further looking into this. The error exit status 137 was coming from the test.config not requesting enough memory. This can be fixed by editing the test.config file to have the memory increased to at least 12GB. If you run nextflow run cdcgov/phoenix .... this will be pull a version of the pipeline into ~/.nextflow/assests/cdcgov/phoenix so you should edit the file ~/.nextflow/assests/cdcgov/phoenix/conf/test.config as described above, save, and rerun the pipeline.

Siddra2021 commented 1 year ago

@jvhagey , Thanks for your feedback. With the kraken,config file the memory issue was resolved and I had a successful installation. However, now that I run the samples (I only have two on my sample sheet), the pipeline runs successfully but I can not locate the out put file. I used the following command : nextflow run cdcgov/phoenix -r v1.0.0 -profile docker -entry PHOENIX --input ~/phoenix/samplesheet.csv --kraken2db ~/phoenix/assets/databases/

Would appreciate your response. Screenshot (19)