YangLab / CLEAR

direct comparison of circular and linear RNA expression
20 stars 11 forks source link

Trouble with using 'circ_quant' function (CLEAR with STAR Alignment) #23

Open jennynuyirs opened 1 week ago

jennynuyirs commented 1 week ago

Hello! I am having some trouble getting the circ_quant function to work. My code is as follows:

circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt"

It produces the error AttributeError: ‘list’ object has no attribute ‘split’ (line 83 of circ_quant.py). It seems like the BAM file input is having trouble being split because the elements are not strings, but I'm skeptical this is actually the case because fixing it would require changing the source code (probably not a good idea).

I am fairly new to bioinformatics and only somewhat experienced with coding, so I'm unsure how to proceed from here. Any potential solutions or suggestions for debugging would be immensely helpful.

I've included the full pipeline below, which is a slightly modified version of @bounlu 's CLEAR with STAR Alignment pipeline. I've tested all the steps separately, which work as they should except the very last circ_quant step.

# define parameters
file_extension="_R1_001.fastq.gz"
read_length=100
ref_genome="hg38"

# make output directories
mkdir "STAR_$ref_genome"
mkdir "STAR_$ref_genome/$read_length"

# download reference files
fetch_ucsc.py "$ref_genome" fa "$ref_genome.fa"
fetch_ucsc.py "$ref_genome" ref "$ref_genome.ref.txt"
cut -f2-11 "$ref_genome.ref.txt" | genePredToGtf file stdin "$ref_genome.ref.gtf"

# generate genome index file
STAR --runMode genomeGenerate --genomeDir "STAR_$ref_genome/$read_length" --limitIObufferSize 1000000000 --runThreadN 16 --genomeFastaFiles "$ref_genome.fa" --outFileNamePrefix ./ --sjdbGTFfile "$ref_genome.ref.gtf" --sjdbOverhang "$(($read_length-1))"

# run pipeline
for read1 in $(ls *$file_extension);
do
        name="${read1%$file_extension}"
        read2="${name}_R2_001.fastq.gz"
        mkdir -p "$name"
        STAR --chimSegmentMin 20 --runThreadN 16 --genomeLoad LoadAndRemove --limitBAMsortRAM 50000000000 --limitIObufferSize 1000000000 --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --outFileNamePrefix "$name/" --genomeDir "STAR_$ref_genome/$read_length" --readFilesIn "$read1" "$read2" > "$name/$name.circRNA_alignment.log" 2>&1
        samtools index "$name/Aligned.sortedByCoord.out.bam"
        fast_circ.py parse -r "$ref_genome.ref.txt" -g "$ref_genome.fa" -t STAR -o "$name/circRNA_out" "$name/Chimeric.out.junction" > "$name/$name.circRNA_parse.log" 2>&1
        circ_quant -c "$name/circRNA_out/circularRNA_known.txt" -b "$name/Aligned.sortedByCoord.out.bam" -r "$ref_genome.ref.txt" -o "$name.circRNA_quant.txt" > "$name/$name.circRNA_quant.log" 2>&1
done