NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
42 stars 18 forks source link

Config file mandatory? #19

Closed ViriatoII closed 4 years ago

ViriatoII commented 4 years ago

Hi, I'm very interested in your pipeline, would make my life easier!

I'm trying to run the commands exactly as you describe but there are small discrepancies:

nextflow run -profile nbis,conda AbinitioTraining.nf \
--genome 'genome_assembly.fasta' \ --maker_evidence_gff 'path/to/annotation.gff3'

(The .nf above is not recognized. It only works after taking it out.

But then, I get this:

N E X T F L O W ~ version 20.01.0 Pulling nextflow-io/AbinitioTraining ... WARN: Cannot read project manifest -- Cause: Remote resource not found: https://api.github.com/repos/nextflow-io/AbinitioTraining/contents/nextflow.config Remote resource not found: https://api.github.com/repos/nextflow-io/AbinitioTraining/contents/main.nf

I don't have a config file? Must I have one? What's missing exactly?

I installed with conda. And by the way, your conda instalation instructions are not so clear, I think the correct way should be:

conda create -n nextflow-env
conda activate nextflow-env
conda install -c bioconda nextflow  # (and nf-core doesn't exist in any repository)

Kind regards, Ricardo

Juke34 commented 4 years ago

I guess it is related to the missing dependency nf-core. This command should works perfectly well and install nextflow and nf-core tool:

conda create -n nextflow-env nf-core nextflow
conda activate nextflow-env

They are in bioconda, so wondering why you don't find nf-core.

nextflow                  20.01.0              hecc5488_0    bioconda
nf-core                   1.9                        py_0    bioconda
mahesh-panchal commented 4 years ago

Hi Ricardo,

Yes, the nf is not recognised because, the command actually references the folder, and not the script. It relies on finding the script name from the manifest information which is in the nextflow.config in that directory.

If you do want to directly reference the script, the command should be:

nextflow run -profile nbis,conda AbinitioTraining/AbinitioTraining.nf \
--genome 'genome_assembly.fasta' \
--maker_evidence_gff 'path/to/annotation.gff3'

It's a little curious why the nextflow config is not found. I'll have to look into that one. It has worked before for me, but there's a possibility it's working from a cache or something. I'll check it out.

Also note that the nbis profile is specifically for our staff within NBIS (although it may still work for you if you're using slurm). It's worth making a custom profile for yourself in a config.

profiles {
   <your_profile_name> {
       process.executor = 'slurm'
       ...
   }
}

And then reference it on the command line: nextflow run -c profile.config -profile <your_profile_name>,conda AbinitioTraining/AbinitioTraining.nf

As for conda, I see we're missing the channel declaration so I'll include that. We have bioconda included in our channels to search.

Also nf-core does exist: https://anaconda.org/bioconda/nf-core , so I'm not sure why you cannot find it.

mahesh-panchal commented 4 years ago

@Juke34 nf-core isn't a dependency. It's to provide additional tools to make creating workflows easier. I'll remove it actually to prevent confusion to outside users.

mahesh-panchal commented 4 years ago

It seems I made a mistake about the running instructions. It's expected that you're in the directory where the scripts are. I'll correct the instructions for running directly from the repository.

mahesh-panchal commented 4 years ago

OK. I understand now about running directly from a Github repo. We're not there yet. For a nextflow script to work through a https call the nextflow.config needs to be in the root of the repository. We're still developing them as stand alone modules at the moment. A single workflow is in progress.

ViriatoII commented 4 years ago

Hey @Juke34, when I tried to install the same way you describe, I get this:

conda create -n nextflow-env nf-core nextflow -c bioconda

Collecting package metadata: done Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  • nf-core -> cookiecutter
  • nf-core -> binaryornot
  • nf-core -> jinja2-time
  • nf-core -> poyo
  • nf-core -> whichcraft

Current channels:

@mahesh-panchal Thank you for the answers. Did you change something now? Should I try reinstalling?

ViriatoII commented 4 years ago

Oh, I see you changed your install instructions. Now they work.

conda create -c conda-forge -c bioconda -n nextflow-env nextflow

mahesh-panchal commented 4 years ago

@ViriatoII nf-core requires the -c conda-forge channel too, but it's not a requirement for the environment. It's only if you're building workflows for nf-core (https://nf-co.re/), or want a tool to easily create workflow templates.

No code has been changed. There's nothing to reinstall. If you want to use the workflows, clone the repo locally and use the paths the to scripts git clone git@github.com:NBISweden/pipelines-nextflow.git.

ViriatoII commented 4 years ago

Dear Mahesh,

I still have the same problem. Even though I installed it and changed the code as you say. Is this an API problem?

nextflow run -profile conda AbinitioTraining/AbinitioTraining.nf \ --genome '../*.fasta' --outdir 'abInitio_nextflow' \ --maker_evidence_gff 'MPI_run.maker.output/MPI_run.all.gff'

N E X T F L O W  ~  version 20.01.0
Pulling nextflow-io/AbinitioTraining ...
Remote resource not found: https://api.github.com/repos/nextflow-io/AbinitioTraining/contents/AbinitioTraining.nf

Also, can I create a profile for PBSpro cluster system? How does it work?

mahesh-panchal commented 4 years ago

What do you mean by installed? Hmm. I'll write a walk through.

Let's say you have a folder called annotation_project in your home directory ($HOME/annotation_project). In that, let's structure our folders, so there are data, scripts, docs, and analyses folders. Your data are in the data folder.

First we make a copy of this repository in the scripts folder:

$ cd $HOME/annotation_project/scripts
$ git clone git@github.com:NBISweden/pipelines-nextflow.git

Next we make a folder for the current analysis in analyses called 2020-04-22_test_nextflow. We can then write a script to run nextflow and record the parameters used. run_nextflow.sh:

#! /usr/bin/env bash

NXF_SCRIPT=$HOME/annotation_project/scripts/pipelines-nextflow/AbinitioTraining/AbinitioTraining.nf
FASTA=$HOME/annotation_project/data/reference/my_reference.fasta
GFF=$HOME/annotation_project/data/MPI_run.maker.output/MPI_run.all.gff

nextflow run -c pbs.config -profile conda,pbs $NXF_SCRIPT \
    --fasta $FASTA \
    --maker_evidence_gff $GFF \
    --outdir $HOME/annotation_project/results/2020-04-22_abInitio_nextflow_test

In that same folder, let's create a config for using PBS. pbs.config:

profiles {
    pbs {
        process.executor = 'pbs'
    }
}
workDir = "$HOME/annotation_project/intermediate/nxf_work"

If you need other settings for PBS take a look at the Nextflow documentation and add them to this config.

Then let's test:

$ cd $HOME/annotation_project/analyses/2020-04-22_test_nextflow
$ ls 
pbs.config
run_nextflow.sh
$ chmod 755 run_nextflow.sh
$ screen -S nextflow_abinitio
# opens a screen terminal session (hopefully in the same directory!)
$ conda activate nextflow-env          # In screen session terminal
$ ./run_nextflow.sh                             # In screen session terminal
# Press ctrl + a, d then to "detach" the screen session. Nextflow will be running inside it.
$ screen -ls                                        # Lists screen sessions. This is in your normal terminal.
$ screen -r nextflow_abinitio             # Resume a screen session. 

Once the pipeline is complete you will then have intermediate and results folders also in your annotation_project folder (If you use the settings above).

Hopefully this helps. Regards, Mahesh

ViriatoII commented 4 years ago

Thank you so much for the detailed walkthrough, @mahesh-panchal ! I am closer to getting it to work.

I have managed to advance with this pbs profile:

profiles {
    pbs {
        process.executor = 'pbs'
        process.clusterOptions = '-A "myprojectname" '
        process.cpus = '8'
        process.memory = "2G"
        process.time = '20h'

    }
}

workDir = "../scratch/nxf_work"

Had a bus error with an agat function. Created an issue on your other github: https://github.com/NBISweden/AGAT/issues/36