PalMuc / TransPi

TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
Other
26 stars 14 forks source link

Issue finding busco when running Transpi with slurm / sbatch #12

Closed infinity01 closed 3 years ago

infinity01 commented 3 years ago

Hello!

I was able to successfully run TransPi when using slurm's srun (interactively), but I'm having trouble getting busco to run when submitting the job with sbatch.

The error: .command.sh: line 4: busco: command not found

The command it is trying to run is:

#!/bin/bash -ue
echo -e "\n-- Starting BUSCO --\n"

busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi//DBs/busco_db/metazoa_odb10 -m tran -c 8 --offline

echo -e "\n-- DONE with BUSCO --\n"

cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv

Do you have an suggestions on any variables to set in the slurm script so that it is able to find busco properly?

Thank you so much!

rivera10 commented 3 years ago

Hello @infinity01,

If you want to deploy TransPi using SLURM you need to configure this with the specifications of your system. In the nextflow.config file that is generated for you in the precheck, you will see a section called profiles (L254-L286). There you can add a profile for your system. Currently, I have one there as an example for my local system (see here).

Nextflow will handle this for you. So if you want to use only SLURM as the job scheduler and do not have any other requirements then you can have that section like this:

mySlurm {    
    process {    
         executor='slurm'    
 }

This will use the CPU and RAM info from the process labels in L195-L229. If you need to specify other requirements to SLURM (e.g. queue, partition, etc.) you can do so using the clusterOptions.

mySlurm {    
    process {    
         executor='slurm'    
         clusterOptions='--partition=big_node --qos=low'
 }

This will make the entire TransPi run (i.e. all processes) to be submitted using SLURM. Nextflow will handle job submission for you, no need to use sbatch. Just add the -profile mySlurm when calling TransPi.

Last, are you running TransPi using containers (docker or singularity) or using the TransPi conda environment created by the precheck?

Let me know if you have any other doubt.

Best, Ramon

infinity01 commented 3 years ago

Thank you so much for the clarification Ramon! I submitted it to our slurm partition successfully.

One concern is that if I accidently sign out of my SSH session, it will stop the job from running (that's why i was trying to use sbatch). Would that happen in this case ?

We are using the conda environment (myconda) by the way.

rivera10 commented 3 years ago

Great! To avoid issues when the SSH session is disconnected you can use screen(example) or nohup(example). That way your TransPi run will not be affected. Hope that helps. Ramon

infinity01 commented 3 years ago

Hi Ramon,

For some reason its still erroring out on the busco command not found. Any thoughts? It worked before when I submitted the job with slurm's srun so not sure if there's any difference..

Is the busco command a binary? I'm not able to find it in the TransPi conda environment's bin folder: .../anaconda/2020.02/envs/TransPi/bin/

Thanks again!

Error executing process > 'busco4_tri (Cprol_R)'

Caused by:
  Process `busco4_tri (Cprol_R)` terminated with an error exit status (127)

Command executed:

  echo -e "\n-- Starting BUSCO --\n"

  busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi//DBs/busco_db/metazoa_odb10 -m tran -c 8 --offline

  echo -e "\n-- DONE with BUSCO --\n"

  cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
  cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv

Command exit status:
  127

Command output:

  -- Starting BUSCO --

Command error:
  .command.sh: line 4: busco: command not found

Work dir:
  /TransPi_files/Cprol/work/c6/4ec6e5b8c066bd0e71a251cb093249

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
rivera10 commented 3 years ago

Hello, You should see a conda env for busco4 when you type conda info -e. If this is not there then it was not created successfully. To quickly solve this run the following:

conda create -n busco4 -c conda-forge -c bioconda busco=4.1.4=py_0 -y

Let me know if this solves the issue. Best, Ramon

infinity01 commented 3 years ago

Unfortunately I'm still getting the same error (not able to find busco command) after installing the busco4 conda env. Is it supposed to change conda environments half way through?

rivera10 commented 3 years ago

Very odd. Can I see the output of conda info -e? Are you running using the TransPi conda env (--myConda), right? What is the entire command you use when calling the pipeline?

infinity01 commented 3 years ago

The entire command is: nextflow run /cm/shared/apps/TransPi/TransPi.nf --all --maxReadLen 100 --k 25,41,57,67 --reads '/TransPi_files/Cprol/Cprol_R[1,2].fastq.gz' --profile conda --myConda -profile RDAC -resume

The conda env's are:

$ conda info -e
# conda environments:
#
base                     /cm/shared/compilers/anaconda/2020.02
TransPi               *  /cm/shared/compilers/anaconda/2020.02/envs/TransPi
busco4                   /cm/shared/compilers/anaconda/2020.02/envs/busco4
rivera10 commented 3 years ago

I think I know what is the issue. You need to have the busco4 env info in the nextflow.config. Either you rerun the precheck so it can generate a new nextflow.config or you can add the PATH of the busco4 conda env to the line 67 of the nextflow.config. It should look like this:

        cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"

Let me know if this solves the issue.

infinity01 commented 3 years ago

Sorry I forget to mention I updated that last night, but it still gave me that error.

    //busco4 conda env
        cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"

Maybe run the precheck again just in case?

rivera10 commented 3 years ago

Yes, rerun the precheck and try again.

infinity01 commented 3 years ago

Hmm, that still didn't work. I'm thinking to maybe add the bin folder of both conda env's the the PATH environment variable ?

rivera10 commented 3 years ago

I never had this issue before. So the new nextflow.config has the PATH of cenv? Try logging out and having all the conda env deactivate before calling TransPi.

In the meantime, you can add the bin to the PATH env so you can continue working. But this is very odd since nextflow will take the PATH of the busco4 conda env and it will activate it automatically. I'll do more test and see if I can find the issue.

infinity01 commented 3 years ago

Now its saying: BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. See the user guide for more information.

Thinking to just reinstall everything at this point...

infinity01 commented 3 years ago
Command executed:

  echo -e "\n-- Starting BUSCO --\n"

  busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi/DBs/busco_db/eukaryota_odb10 -m tran -c 8 --offline

  echo -e "\n-- DONE with BUSCO --\n"

  cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
  cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv

Command exit status:
  1

Command output:

  -- Starting BUSCO --

  BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. See the user guide for more information.

  -- DONE with BUSCO --

Command error:
  cp: cannot stat ‘Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt’: No such file or directory
rivera10 commented 3 years ago

I am testing locally and it is working fine for me. Let's try that, reinstall the tools. 1- Erase the conda env

conda remove -n TransPi -y --all
conda remove -n busco4 -y --all
conda clean -y --all

2- Rerun the precheck.

Remember to take out the bin directory from the PATH env and source the .bashrc. Do you have the config file in the same directory as the main script? It seems like the config is not working properly or nextflow cannot use it properly. Let me know how it goes.

infinity01 commented 3 years ago

Hi Ramon,

I installed and re-ran TransPi from scratch and I'm still getting the busco command not found error during the "busco4_tri" step. However, if I change conda environments to busco4, I am able to find it with the "which busco" command.

Are you supposed to load any conda environments prior to executing TransPi with nextflow ? Right now I am loading TransPi conda environment prior to executing it.

I confirmed the conda environments for TransPi and busco4 are installed with the "conda info -e" command. Also both busco4 and TransPi environments are set correctly in the nextflow.config:

    // PATH to conda installation from precheck. Leave blank is precheck was not used or you will use comntainers
        myCondaInstall="/cm/shared/compilers/anaconda/2020.02/envs/TransPi"

    //busco4 conda env
        cenv="/cm/shared/compilers/anaconda/2020.02/envs/busco4"

I am using slurm on a clustered setup so it is executing each step of TransPi on another node.

Any other thoughts?

Thanks again!

Something went wrong. Check error message below and/or log files.
Error executing process > 'busco4_tri (Cprol_R)'

Caused by:
  Process `busco4_tri (Cprol_R)` terminated with an error exit status (127)

Command executed:

  echo -e "\n-- Starting BUSCO --\n"

  busco -i Cprol_R.Trinity.fa -o Cprol_R.Trinity.bus4 -l /cm/shared/apps/TransPi/DBs/busco_db/eukaryota_odb10 -m tran -c 20 --offline

  echo -e "\n-- DONE with BUSCO --\n"

  cp Cprol_R.Trinity.bus4/short_summary.*.Cprol_R.Trinity.bus4.txt .
  cp Cprol_R.Trinity.bus4/run_*/full_table.tsv full_table_Cprol_R.Trinity.bus4.tsv

Command exit status:
  127

Command output:

  -- Starting BUSCO --

Command error:
  .command.sh: line 4: busco: command not found

Work dir:
  /TransPi/TransPi_files/Cprol/work/e5/3630925441d6d889d73349bb00b04a

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
rivera10 commented 3 years ago

Hello @infinity01,

I did some tests in a cluster from the university and in a virtual machine and it is working fine for me. I do not have any issues with busco.

Nextflow handles the activation of the conda environments for you, no need to activate them before calling the pipeline. Can you try deactivating the TransPi environment before running the pipeline?

rivera10 commented 3 years ago

@infinity01 I just noticed that you are using --profile instead of -profile (only one dash is required since it is a nextflow built in function). Since you have the conda of TransPi activated the other programs run fine (are available in the PATH). But since busco4 is a separate conda environment (due to conflicts in versions) that is why nextflow cannot find it. Apologies I did not see this before. Can you try using -profile? Let me know.

rivera10 commented 3 years ago
nextflow run /cm/shared/apps/TransPi/TransPi.nf --all --maxReadLen 100 --k 25,41,57,67 --reads '/TransPi_files/Cprol/Cprol_R[1,2].fastq.gz' -profile conda,RDAC --myConda -resume
rivera10 commented 3 years ago

Also, you can provide various profiles by using commas. See example above

infinity01 commented 3 years ago

Ah, good find! It always gets confusing which options use one or two hyphens. I resubmitted it with -resume but for some reason its rerunning the entire thing again. I'll just let it do its thing and let you know tomorrow.

Thanks you so much!

infinity01 commented 3 years ago

Hi Ramon, It is working as expected. Thank you so much again for your time.

rivera10 commented 3 years ago

Great! I'll close the issue now. Anything let me know.