YangLab / CLEAR

direct comparison of circular and linear RNA expression
20 stars 11 forks source link

CLEAR with STAR alignment #11

Closed bounlu closed 3 years ago

bounlu commented 4 years ago

I attach below the full CLEAR pipeline with STAR alignment in case someone needs:

# define parameters
file_extension="_1.fq.gz"
read_length=100
ref_genome="hg19"

# download reference files
fetch_ucsc.py "$ref_genome" fa "$ref_genome.fa"
fetch_ucsc.py "$ref_genome" ref "$ref_genome.ref.txt"
cut -f2-11 "$ref_genome.ref.txt" | genePredToGtf file stdin "$ref_genome.ref.gtf"

# generate genome index file
STAR --runMode genomeGenerate --genomeDir "STAR_$ref_genome/$read_length" --limitIObufferSize 1000000000 --runThreadN 16 --genomeFastaFiles "$ref_genome.fa" --outFileNamePrefix ./ --sjdbGTFfile "$ref_genome.ref.gtf" --sjdbOverhang "$(($read_length-1))"

# run pipeline
for read1 in $(find . -type l -name "*$file_extension"); do
        name="${read1%_1.fq.gz}" && \
        read2="${name}_2.fq.gz" && \
        mkdir -p "$name" && \
        STAR --chimSegmentMin 20 --runThreadN 16 --genomeLoad LoadAndRemove --limitBAMsortRAM 50000000000 --limitIObufferSize 1000000000 --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --outFileNamePrefix "$name/" --genomeDir "STAR_$ref_genome/100" --readFilesIn "$read1" "$read2" > "$name/$name.circRNA_alignment.log" 2>&1 && \
        samtools index "$name/Aligned.sortedByCoord.out.bam" && \
        fast_circ.py parse -r "$ref_genome.ref.txt" -g "$ref_genome.fa" -t STAR -o "$name/circRNA_out" "$name/Chimeric.out.junction" > "$name/$name.circRNA_parse.log" 2>&1 && \
        circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt" > "$name/$name.circRNA_quant.log" 2>&1 &
done

Have you tried CSI NGS Portal yet?

xingma commented 3 years ago

Thanks for your pipeline!