YangLab / CLEAR

direct comparison of circular and linear RNA expression
20 stars 11 forks source link

exonFrames field is being added, -genePredExt but no valid frames #13

Open sharmi85 opened 3 years ago

sharmi85 commented 3 years ago

Hello. I am trying to use CLEAR for my data set and running the following command: clear_quant -1 /userdata/sharmishtha/Hela/trimmedFastqFiles/trim_HeLa-AMT-1_R1.fastq.gz -2 /userdata/sharmishtha/Hela/trimmedFastqFiles/trim_HeLa-AMT-1_R2.fastq.gz -g /userdata/sharmishtha/ref_and_anno/hg38/hg38.fa -i /userdata/sharmishtha/IndexFiles/hg38/hisat2index/hg38_hisat2_index -j /userdata/sharmishtha/IndexFiles/hg38/bowtie1_index/bowtie1_index -G /userdata/sharmishtha/IndexFiles/hg38/hg38_kg.gtf -o HelaAMT1_output_dir

The steps untill tophat fusion worked, but got an error after Tophat fusion:

Start circRNA annotation

Error: exonFrames field is being added, but I found a gene (ENST00000602051.5) with CDS but no valid frames. This can happen if program is invoked with -genePredExt but no valid frames are given in the file. If the 8th field of GFF/GTF file is always a placeholder, then don't use -genePredExt. Traceback (most recent call last): File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/bin/clear_quant", line 11, in load_entry_point('CLEAR==1.0.0', 'console_scripts', 'clear_quant')() File "build/bdist.linux-x86_64/egg/src/run.py", line 262, in main File "build/bdist.linux-x86_64/egg/src/run.py", line 173, in circ_annot File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['gtfToGenePred', '-genePredExt', '/userdata/sharmishtha/IndexFiles/hg38/hg38_kg.gtf', 'HelaAMT1_output_dir/circ/genePred.tmp']' returned non-zero exit status 255

I used te Circ explorer2 command to get the gtf file: cut -f2-11 hg38_ref.txt|genePredToGtf file stdin hg38_ref.gtf

So I dont know whats going on. Why is the gtf file is giving the error. kindly help

sharmi85 commented 3 years ago

I tried a differnt file but doesnt work... Did you use th eknown genes file for the annotation? What command did you use to download the known genes txt file? and also what command did you use to convert to the gtf format? I used the command listed out in CIrc Explorer2 pipeline.

Download human reference genome sequence file:28th Dec 2020 fetch_ucsc.py hg38 kg hg38_kg.txt

Convert gene annotation file to GTF format (require genePredToGtf) converted on 28th Dec 2020

cut -f2-11 hg38_kg.txt|genePredToGtf file stdin hg38_kg.txt

Please help... I am stuck in the circRNA annotation

Thanks Sharmi

xingma commented 3 years ago

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

sharmi85 commented 3 years ago

Thank you so much This should help.. will try and update you back. Thanks Sharmi

On Wed, Dec 30, 2020 at 6:32 AM xingma notifications@github.com wrote:

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752291721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFPZB2EKKQ5E2RHJTMLLSXJ33JANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

sharmi85 commented 3 years ago

Nope didn't work gave me the same error

Start circRNA annotation

Error: exonFrames field is being added, but I found a gene (ENSMUST00000221646.1) with CDS but no valid frames. This can happen if program is invoked with -genePredExt but no valid frames are given in the file. If the 8th field of GFF/GTF file is always a placeholder, then don't use -genePredExt. Traceback (most recent call last): File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/bin/clear_quant", line 11, in load_entry_point('CLEAR==1.0.0', 'console_scripts', 'clear_quant')() File "build/bdist.linux-x86_64/egg/src/run.py", line 262, in main File "build/bdist.linux-x86_64/egg/src/run.py", line 173, in circ_annot File "/userdata/sharmishtha/tools/anaconda3/envs/myenv/lib/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['gtfToGenePred', '-genePredExt', '/userdata/sharmishtha/ref_and_anno/mm10_20/mm10_kg.gtf', '66Old_output_dir/circ/genePred.tmp']' returned non-zero exit status 255

On Wed, Dec 30, 2020 at 12:21 PM Sharmishtha Shyamal < sharmishyamal85@gmail.com> wrote:

Thank you so much This should help.. will try and update you back. Thanks Sharmi

On Wed, Dec 30, 2020 at 6:32 AM xingma notifications@github.com wrote:

I think the command below will solve this problem:

perl -alne '$,="\t";print (@F[1..@F-1], 0, $F[0])' hg38_kg.txt | genePredToGtf file stdin hg38_kg.gtf

The hg38_kg.gtf file is the needed file for clear_quant.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752291721, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFPZB2EKKQ5E2RHJTMLLSXJ33JANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India

xingma commented 3 years ago

Hi, this problem is caused by few transcript annotations with strange start codon and stop codon position. I have updated CLEAR to 1.0.1 to solve this problem. Thanks.

sharmi85 commented 3 years ago

Thank you for your response. will update the version and try

On Thu, Dec 31, 2020 at 2:13 PM xingma notifications@github.com wrote:

Hi, this problem is caused by few transcript annotations with strange start codon and stop codon position. I have updated CLEAR to 1.0.1 to solve this problem. Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/YangLab/CLEAR/issues/13#issuecomment-752891500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASJAFP6CV6NR2BULEJXKVJTSXQ2UZANCNFSM4VLTEYBA .

-- Regards

Sharmishtha Shyamal, PhD Research Associate RNA Biology Lab Institute of Life Science-DBT Bhubaneshwar, Odisha India