CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

Trimming adapters #456

Closed ky66 closed 3 years ago

ky66 commented 3 years ago

How do I specify the sequence of adapters I want to cut out for the RNA-Seq pipeline?

skchronicles commented 3 years ago

Hello @ky66,

I hope you're having a great day, and thank you for reach out to our team.

Here is the fasta file the RNA-seq pipeline uses internally for removing adapter sequences:

>Nextera_PrefixNX/1
AGATGTGTATAAGAGACAG
>Nextera_Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Nextera_Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Nextera_Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Nextera_Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC
>TruSeq3_PE1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq3_PE1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
>TruSeq3_PE2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>TruSeq3_PE2_rc
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>TruSeq_Small_RNA
TGGAATTCTCGGGTGCCAAGG
>NEB_miRNA_3primeACTGTAGGCACCATCAAT/AACTGTAGGCACCATCAAT
AGATCGGAAGAGCACACGTCT
>Illumina_Single_End_Adapter_1
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
>Illumina_Single_End_Adapter_2
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
>Illumina_Paired_End_Adapter_2
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
>Illumina_Paired_End_PCR_Primer_2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>Illumina_Paired_End_Sequencing_Primer_2
CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>Illumina_DpnII_expression_Adapter_1
ACAGGTTCAGAGTTCTACAGTCCGAC
>Illumina_DpnII_expression_PCR_Primer_1
CAAGCAGAAGACGGCATACGA
>Illumina_DpnII_expression_PCR_Primer_2
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
>Illumina_NlaIII_expression_Adapter_1
ACAGGTTCAGAGTTCTACAGTCCGACATG
>Illumina_NlaIII_expression_Sequencing_Primer
CCGACAGGTTCAGAGTTCTACAGTCCGACATG
>Illumina_Multiplexing_Adapter_1
GATCGGAAGAGCACACGTCT
>Illumina_Multiplexing_Index_Sequencing_Primer
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>Illumina_PCR_Primer_Index
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTC
>Illumina_DpnII_Gex_Adapter_1
GATCGTCGGACTGTAGAACTCTGAAC
>Illumina_DpnII_Gex_Adapter_2.01
TCGTATGCCGTCTTCTGCTTG
>Illumina_DpnII_Gex_Sequencing_Primer
CGACAGGTTCAGAGTTCTACAGTCCGACGATC
>Illumina_NlaIII_Gex_Adapter_1.01
TCGGACTGTAGAACTCTGAAC
>Illumina_Small_RNA_3p_Adapter_1
ATCTCGTATGCCGTCTTCTGCTTG
>TruSeq_Universal_Adapter
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>TruSeq_Adapter,_Indices
GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNTCTCGTATGCCGTCTTCTGCTTG
>Illumina_RNA_RT_Primer
GCCTTGGCACCCGAGAATTCCA
>Illumina_RNA_PCR_Primer
AATGATACGGCGACCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGA
>RNA_PCR_Primer_Indices
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA
>Illumina_Universal_Adapter
AGATCGGAAGAG
>Illumina_Small_RNA_3'_Adapter
TGGAATTCTCGG
>Illumina_Small_RNA_5'_Adapter
GATCGTCGGACT
>Nextera_Transposase_Sequence
CTGTCTCTTATA
>NEB_miRNA_5prime
GTTCAGAGTTCTACAGTCCGACGATC
>Qiagen_miRNA_5prime
GTTCAGAGTTCTACAGTCCGACGATC
>Qiagen_miRNA_3prime
AACTGTAGGCACCATCAAT
>PolyA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>PolyC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
>PolyG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
>PolyT
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Please Note: Here is the location to the file Pipeliner uses: /data/CCBR_Pipeliner/db/PipeDB/dev/TruSeq_and_nextera_adapters.consolidated.fa

Adding Custom Adapters

If this file does not meet your needs, you can provide your own fasta to cutadapt by taking the following steps:

Step 0. Clone Pipeliner

Please clone Pipeliner from our Github repo and change branches from master to activeDev.

# Clone Pipeliner in your data directory
cd /data/$USER
git clone https://github.com/CCBR/Pipeliner.git
# Switch to activeDev branch 
cd Pipeliner/
git checkout activeDev

Step 1. Edit standard-bin.json

In this example, it is assumed your new adapters fasta file is called custom_adapters.fa, and it is located in your /data directory:

Step 2. That's it! Start up your cloned version Pipeliner

This assumes you are in the following directory: /data/$USER/Pipeliner/

# Start up the GUI
 ./ccbrpipe.sh 

From here you can follow the instructions on our Quick Start page: https://ccbr.github.io/pipeliner-docs/RNA-seq/TLDR-RNA-seq/#setup-pipeliner

Please let me know if you have any additional questions. We are always happy to help!