MIT License
NMD workflow

This repository exists for reproducibility purpose. The data generated on this workflow powers the NMDtxDB. Raw data is available at the SRA PRJNA1054031. RNA-seq reads need to be pre-processed and alignment before input.

Workflow description

The workflow comprises two parts. The first part comprises a Snakemake workflow (workflow). The second part enables the CDS detection and integration.


Part 1

This refers to the workflow to generate the de novo transcriptome, and compute DGE and DTE.

snakemake --jobs 10 --cores 10 --profile slurm --printshellcmds --reason --use-singularity --use-conda --use-envmodule

To produce the DAG:

snakemake --rulegraph | dot -Tsvg >

Part 2

This refers to the workflow for CDS detection. Here an example using sequences trimmed by the Ensembl start codon:

awk '{ print $1 "\t" $7-1 "\t" $8 "\t" $4 "\t" 1 "\t" $6; }' GRCh38.102.gtf > ref_cds.bed

Rscript cds/StartATG_to_cDNA.R ref_cds.bed

perl --input GRCh38.102.fa --startcodon ref_cds_cDNA.bed > ensembl_longorf2.fa 

See longorf_integration_bed12 script, which details how the multiple source integration is done.

To retrieve the other sources:



This project is licensed under the MIT.


This work was supported by the DFG Research Infrastructure West German Genome Center, project 407493903, as part of the Next-Generation Sequencing Competence Network, project 423957469.