Christina-hshi / psirc

Full-length linear and circular transcript isoform reconstruction and quantification
MIT License
11 stars 4 forks source link

How can I get the internal exons in detected circRNA. #3

Closed Alipe2021 closed 2 years ago

Alipe2021 commented 2 years ago

Dear Dr. Christina Huan Shi,

Thank you for your great work. This software is very helpful and easy to use.

We got a series of output files in our study as follows: image

To obtain more detailed circrna information such as circRNA strand, the internal exons, the spilce signal, and so on, we extract the exons which located in first-last exons' region as circexons. The coordinate of first-last exons were been fetched from the bsj id of circRNA (candidate_circ_junctions.bed), and the strand of circRNA was feteched according to Zma mays reference bed file.

However, comparing with the sequence in the file 'candidate_circ_junctions.fa' I got different sequence.

Could you please tell me how can I get more detail information about circRNAs? I' ll appreciate your help vey much.

Yours sincerely,

Peng Liu.

hoyu310 commented 2 years ago

Hi Peng, The outputs in your screenshot seemed to have generated too fast and the file sizes look small - this might be due to the same issue another user reported. If you haven't already, can you please try the run again with "kallisto" being the official 0.43.1 version, i.e. https://github.com/pachterlab/kallisto/releases/download/v0.43.1/kallisto_linux-v0.43.1.tar.gz, and then see if the results are the same?

To extract the backsplice donor and acceptor exons, you can consider the _EwBx suffix information in candidate_circ_junctions.bed, column 4. The number after E is the donor exon number and the number after B is the acceptor exon number. Exon numbers are counted from left to right for + strand transcripts (i.e. exon 1 is left-most in terms of genome coordinates) and counted from right to left for - strand transcripts (i.e. exon 1 is right-most in coordinates).

The sequence of these two exons will also be found in the beginning and end parts of the corresponding fasta entry (transcriptID;geneName_EwBx) in full_length_isoforms.fa.

Let me know whether this makes sense or if you have further questions. Regards, Ken

Christina-hshi commented 2 years ago

Hi Peng,

Since you haven't posted any further questions, so I will close this issue. Hope everything goes well. Let me know whenever you have one, I will reopen this.

Best, Christina