bioinfo-biols / CIRI-long

Circular RNA Identification for Nanopore Sequencing
https://ciri-cookbook.readthedocs.io
MIT License
17 stars 5 forks source link

What is the meaning of `bsj, signal, partial` in *.json #18

Closed algaebrown closed 1 year ago

algaebrown commented 1 year ago

Hi Kevin, thanks for making this wonderful package.

I wonder what are the meanings of these numbers:

{"consensus": 85947, "raw_unmapped": 62939, "ccs_mapped": 51403, "bsj": 26347, "signal": 19441, "partial": 11334}

and when I look at the row numbers in .read output in the collapse step, the numbers don't quite match up with json: the .reads file has 13409 reads? does this corresponds to all the reads mapping to circular RNAs?

I think the repo would benefit from having a detail documentation of:

  1. the column meanings in *.read
  2. the numbers in *.json

Thanks!!!

Kevinzjy commented 1 year ago

These are the number of candidate reads after each filtering step (as indicated in the supplementary Fig.3 in the CIRI-long manuscript).

  1. consensus: cyclic consensus reads that have repetitive patterns
  2. raw_unmapped: raw reads that could not be aligned to the reference genome with >80% identity
  3. ccs_mapped: CCS sequence that could be aligned to the reference sequence
  4. bsj: CCS with back-spliced junction events detected
  5. splice_signal: CCS with canonical splice signal detected in the flanking region of BSJ site
  6. partial: reads that do not have repetitive CCS structure, but have back-spliced junction site

The collapse step drop will drop singleton reads (BSJ with only 1 supporting reads in all samples), so the number should be smaller than the json file.