Open mmaitenat opened 2 years ago
Hi, you can find the information you need in the 7th column of *.cand_circ.fa
generated by the CIRI-long call
command, which contains the start and end position of CCS segments in the raw ONT reads.
>9b2ec396-b290-4b90-b115-ff3fbc33076d chr1:87938154-87940306 + 87938154-87938265|114,87938346-87938438|94,87939196-87939315|121,87940177-87940306|130 AG-GT|2-1 288|0-463 10-452;452-905;905-1358;1358-1467
GCATTCAGGGAGATAGCACAGTCCCACAGAGCCATGGAACAGGAGCTCGCACATGCTGTCAATGCCAGCTCCAAAGCCATGGAGCAGTATACAGCAAGCCCAGAACTGCAGAGGGTTGAACTGCCAGCTTTGTTCTGGAGATGGTGAATAACATCAGAGCACTGCGCAGTGAGACAGAGCTGCTGCTGGCTGGGAAGATGGCCCTGCAATTGGATCCCCCTCAGAAGGAACGGCAGAAACCGGGGCTGCCCTAATTGAGATGGACCAGCAGCTCAGGAAGCTGACAGACACTCCCTGGCTTTACGCCAGCCCTTGGAAGCCTGGTGAGGAAGAGTCTCTCCAACAGAATGTGATGCTGGATCTTACTAAACGCAGCCGTAGTGGTAAATTCCGCCTTGTGACCAAGTTTAAAAAGGAGAAAAACAATAAGAACAAAGAAGTTCACAGTAACCTAGGAGGCCCT
>82ca865f-cd01-4c4b-a12b-5343d9f8464b chr4:95850782-95851509 + 95850782-95851509|731 AG-GT*|-3--6 611|3-725 41-758;758-845
GGTAGTCCTCTAGAGCTGATGAGGTTTGTAGAGTCAGACCCCAGCTACAGCTGTAGAACCAGGCATCCTTGGTTGCTGGAAACCAATCCTGGAAGCAGAGTACTAGCGCATGCCCAAACTCATGAAACAGCCAGTATAGAGCTGGAAGAAAGTCAGACCCCCAGCTACCAGCTGAGAACCAGGCACTTCAACCACTTGCCCGCATGCCCCAGTGTTAGAAGTGACAAACCAGGTGTTCTAATAATTTTTAATAATTGGGAATTCAATTTGCTGTGACTGCCTGAGTGTGGCAGACCCTGTGCTAAGTTCTTTAGTATAGCTCTCCTAATGCATATAATACCCTTTCATGGCCTGTAAGAGGGCCAGAAACTTACAAACACAGACCATTAGAAACCTCCAGTGGCAGAAGCCCATTTCCAGTTTAAGAATGGAGCTGGGCATGTGGCTTGGTGCTTAAAGCACTTCTGTCTTCCAGAGGACCTGCATCAATTTCCAGTACATTGTTGGTTCATCTGTGGAGTTATCATCTGTAACTCCGGTACCAGGAGTCTACTGCCCTCTCCTTCTGGAATTACCCTGGTGGTGGTGCCTATGCATAAACCTATCATTCAATCTATACAAAACAAACTAATCAATTACTCAATACGAAATAATATGTGCAACTAATTGTCATTGGATGGGCTGACTGTAGTGATGAATTGTCTCATAAAAGGTCAGTCTGGGCA
The *.reads
output of the CIRI-long collapse
also includes the correspondence between the read id (1st column) and the collapsed circRNA id (2nd column).
read_id circ_id tmp_id strand cirexons signal alignment segments sample type
d6637a72-5a5b-41e6-8341-25ed39330ed2 chr1:3421702-3526342 chr1:3421702-3526342 - 3421702-3421901|201,3516918-3517016|100,3517613-3517717|106,3523427-3523692|267,3526200-3526342|137 AG-GT|1-7 263|9-831 41-858;858-1254 Long_SMARTer_H-_repfull
0ab11e74-33e9-4272-aae0-2a22035e7bc1 chr1:3421702-3526342 chr1:3421702-3526342 - 3421702-3421901|199,3516918-3517016|100,3517613-3517717|106,3523427-3523692|267,3526200-3526342|146 AG-GT|-1--2 151|0-831 10-827;827-1232 Long_SMARTer_H-_repfull
Hi!
That's clear, thanks!
I am so sorry for asking so many questions, but I'm afraid I have a few more...
When I was going through this, I found in the .info files circRNAs with negative length values. Let me show you some examples:
grep 'circ_len "-' barcode03.info | head -5
`2 CIRI-long circRNA 76696524 76696522 3 - . circ_id "2:76696524-76696522"; splice_site "AG-GT|0--2"; equivalent_seq ""; circ_type "Unknown"; circ_len "-2"; isoform "76696524-76696522";
2 CIRI-long circRNA 117281827 117281825 5 + . circ_id "2:117281827-117281825"; splice_site "AG-GT|7-5"; equivalent_seq "G"; circ_type "Unknown"; circ_len "-2"; isoform "117281827-117281825";
2 CIRI-long circRNA 121347282 121347280 2 + . circ_id "2:121347282-121347280"; splice_site "AG-GT|10-8"; equivalent_seq ""; circ_type "Unknown"; circ_len "-2"; isoform "121347282-121347280";
2 CIRI-long circRNA 128669829 128669827 5 - . circ_id "2:128669829-128669827"; splice_site "AG-GT|-7--9"; equivalent_seq ""; circ_type "Unknown"; circ_len "-2"; isoform "128669829-128669827";
3 CIRI-long circRNA 89958600 89958598 2 - . circ_id "3:89958600-89958598"; splice_site "AG-GT|5-3"; equivalent_seq ""; circ_type "Unknown"; circ_len "-2"; isoform "89958600-89958598"; I also found in the same file circRNAs with unknown strand and splice_site info, as follows:
grep 'splice_site "None' barcode03.info | head -5
1 CIRI-long circRNA 3215147 3215449 5 None . circ_id "1:3215147-3215449"; splice_site "None"; equivalent_seq ""; circ_type "Unknown"; circ_len "302"; isoform "3215147-3215449";
1 CIRI-long circRNA 9940139 9940778 5 None . circ_id "1:9940139-9940778"; splice_site "None"; equivalent_seq ""; circ_type "Unknown"; circ_len "639"; isoform "9940139-9940778";
1 CIRI-long circRNA 15396359 15396994 7 None . circ_id "1:15396359-15396994"; splice_site "None"; equivalent_seq ""; circ_type "Unknown"; circ_len "635"; isoform "15396359-15396994";
1 CIRI-long circRNA 22552121 22552535 2 None . circ_id "1:22552121-22552535"; splice_site "None"; equivalent_seq "ggg"; circ_type "Unknown"; circ_len "414"; isoform "22552121-22552535";
1 CIRI-long circRNA 32390644 32391226 2 None . circ_id "1:32390644-32391226"; splice_site "None"; equivalent_seq ""; circ_type "Unknown"; circ_len "582"; isoform "32390644-32391226";`
Could you be so kind to explain which situation do these circRNAs correspond to and how should I treat them?
Thank you very much!
I am so sorry, I just saw an issue regarding the circRNAs with negative and 0 length, and your recommendation to remove them as they come from erroneous reads. Still, I was wondering whether I should keep those with splice_site="None" or these may be errors too.
Thanks!
Hi, the current version of CIRI-long on GitHub will remove these negative length circRNAs, and I will update the version on PyPI with the next formal release.
splice_site='None' means no pre-defined splice site could be found in the BSJ region of CCS reads, it's hard to tell whether these circRNAs are reverse transcription artifacts or real circRNAs. If you're using model species with well-defined splice sites, then it's better to filter them out.
Hi again,
I would like to know if the number of times a circRNA is repeated in each read (which I think you call CCS copy number) is reported somewhere in the output of CIRILONG. Mi idea is to get plots similar to those in Supplementary Figure 7 in your article "Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long" with my own data.
Thanks!
Maitena.