Error in prepDE.py - Githubissues

chickenHyop commented 1 year ago

Hello, I encountered an error using CIRI-DE.

I want to perform DE analysis of samples with replication, but I keep getting the same error during prepDE.py .

$ prepDE.py -i sample_gene.lst

Error: could not locate transcript ENST00000618889 entry for sample CASE2 Traceback (most recent call last): File "/data/leehyobin/Step0_RawFile/stringtie/stringtie/prepDE.py", line 284, in <module> geneDict.setdefault(geneIDs[i],{}) #gene_id KeyError: 'ENST00000618889'

Even though I changed the sample gtf file to another sample, I get the same error that says there could not locate some transcripts. Since this tool detects the circRNA, isn't it possible to not have the same transcript among different samples?

When analyzing DE, do I have to merge the output files by experimental groups and then proceed with the analysis?

As an additional question, isn't this CIRIquant a tool that identifies and even annotates the circRNA? Can a known circRNA and novel RNA produce a separate output file? Or do I have to find directly the output file created after CIRI-quant in a database like circBase?

Thank you for developing a tool that helps with circRNA analysis.

Kevinzjy commented 1 year ago

Hi @chickenHyop , this is a StringTie error that could be caused by some incomplete runs of StringTie during the process. So, I suggest you check the log file to see if StringTie finished successfully. Besides, you should also make sure that you're using the latest version of StringTie and prepDE.py. I just run some samples with StringTie 2.1.5 and everything works fine.

For your second question, CIRIquant just detects and quantifies circRNAs in your data. If you want to see if these circRNAs are already annotated in public circRNA resources (e.g. our circAtlas database), you have to search these databases manually.

chickenHyop commented 1 year ago

@Kevinzjy Thank you for your comment.

So sorry, it was my mistake! I run my samples with StringTie v2.2.1. And I just tried different command that $ prepDE.py3 -i sample_gene.lst Then it works successfully and I got the output files: gene_count_matrix.csv & transcript_count_matrix.csv

And second comment, I understand what you're saying. I will perform a circBase analysis with the predicated-circRNA data generated after CIRI-quant.

However, I have just one more question. Even if the three samples belong to the replication in same experimental group, there will be different detected or undetected circRNAs in each sample's gtf files. Is there no problem with this part in the process of merging the repetitive experimental group during DE analysis? For example, In Case 1 has circ_id "1:23387273|23419139" but Case 2 doen't have circ_id "1:23387273|23419139" If there is an undetected circRNA in Case2, will it be treated as 0 value?

Thank you.

Kevinzjy commented 1 year ago

@chickenHyop Yes, the circRNA will have 0 reads in Case 2, but it would affect the differential expression analysis as the edgeR model can handle this situation.

chickenHyop commented 1 year ago

@Kevinzjy Thank you very much for your quick and accurate response.

bioinfo-biols / CIRIquant

Error in prepDE.py #55