Plant-Food-Research-Open / genepal

A Nextflow pipeline for genome and pan-genome annotation
MIT License
9 stars 3 forks source link
annotation gene genome pangenome phased


GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow run with conda ❌ run with docker run with singularity Launch on Seqera Platform


plant-food-research-open/genepal is a bioinformatics pipeline for single genome, phased genomes and pan-genome annotation. An overview is shown in the Pipeline Flowchart and the references for the tools are listed in Protein coding gene structures are predicted with BRAKER which uses GeneMark-ES/ET/EP+/ETP. These tools require a license for commercial works.

Pipeline Flowchart


Refer to usage, parameters and output documents for details.

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare an assemblysheet with your input genomes that looks as follows:


tag         ,fasta              ,is_masked
a_thaliana  ,/path/to/genome.fa ,yes

Each row represents an input genome and the fields are:

At minimum, a file with proteins as evidence is also required. Now, you can run the pipeline using:

nextflow run plant-food-research-open/genepal \
  -revision <version> \
  -profile <docker/singularity/.../institute> \
  --input assemblysheet.csv \
  --protein_evidence proteins.faa \
  --outdir <OUTDIR>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Plant&Food Users

Download the pipeline to your /workspace/$USER folder. Change the parameters defined in the pfr/params.json file. Submit the pipeline to SLURM for execution.

sbatch ./pfr_genepal


plant-food-research-open/genepal workflows were originally scripted by Jason Shiller (@jasonshiller). Usman Rashid (@gallvp) wrote the Nextflow pipeline.

We thank the following people for their extensive assistance in the development of this pipeline:

The pipeline uses nf-core modules contributed by following authors:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.


If you use plant-food-research-open/genepal for your analysis, please cite it as:

genepal: A Nextflow pipeline for genome and pan-genome annotation.

Usman Rashid, Jason Shiller, Ross Crowhurst, Chen Wu, Ting-Hsuan Chen, Leonardo Salgado, Charles David, Sarah Bailey, Ignacio Carvajal, Anand Rampadarath, Ken Smith, Liam Le Lievre, Cecilia Deng, Susan Thomson

zenodo. 2024. doi: 10.5281/zenodo.14195006.

An extensive list of references for the tools used by the pipeline can be found in the file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.