Closed Eteleeb closed 2 years ago
Dear @Eteleeb,
The -ss
parameter is specific for RNA-seq data produced with second-strand libraries. See https://www.biostars.org/p/64250/ for some information about that topic.
The duplicate is actually not a duplicate. DCC found one circRNA for the annotated gene on the annotated strand - but also a possible circRNA candidate on the antisense strand. This happens regularly, but should not occur too often (i.e. if you have most of your circRNAs annotated as not_annotated
there is something wrong with the DCC settings - that would be a case for the -ss
option).
Cheers, Tobias
Thank you Tobias for the clarification. So, it possible the same circRNA with the exact start-end position to be on both strands. I didn't check many but I saw this situation in a few cases. The issue is that more than 50% of the circRNAs detected are classified as "not_annotated" which concerns me. Out of 9,158 circRNA candidates, only 3,720 were annotated circRNAs. Any thoughts why this happened?. I tried "-ss" with only two samples and I saw the same example as the one I included above but didn't run with "-ss" on everything. Do you think I should use the "-ss" parameter and run on everything?. I am not sure if my library was first-strand or second-strand but for sure it is strand-specific. Thank you.
-Abdallah
Hi @Eteleeb,
I'd give it a try to rerun everything with -ss
.
If that does not work we will see what else can be done.
Cheers, Tobias
I know that we used TruSeq Stranded Total RNA Sample Prep with Ribo-Zero Gold kit (Illumina) for our library and according to this:
The following list gives an overview of common sequencing kits and the respective parameter choice: First-strand kits (default):
● All dUTP methods, NSR, NNSR
● TruSeq Stranded Total RNA Sample Prep Kit
● TruSeq Stranded mRNA Sample Prep Kit
● NEB Ultra Directional RNA Library Prep Kit
● Agilent SureSelect Strand-Specific
Second-strand kits (second-strand parameter -ss has to be used):
● Directional Illumina (Ligation), Standard SOLiD
● ScriptSeq v2 RNA-Seq Library Preparation Kit
● SMARTer Stranded Total RNA
● Encore Complete RNA-Seq Library Systems
Probably I shouldn't use "-ss", right?
Yeah, in that case, -ss
should not be used. But it seems to be stranded data, and your are not using -N
. So from the parameters everything looks okay. Anyway, I would still run -ss
in a second run, just to be able to compare.
Thank you Tobias. One final question, We are planning to include DCC within the implementation of our pipeline and I was wondering if it is possible to run DCC sample-by-sample (with -Nr 1 1) and then combine and filter the results. Our pipeline is a sample-specific and runs sample-by-sample. This would provide us two advantages, (1) DCC will be run immediately on each sample we process within the pipeline, (2) we think that this will have a significant reduction of the amount of time DCC takes to process all samples in a combined mode. Is this something can be done?. Thank you.
-A
While this is not directly supported, I also did it from time to time. There might be small differences between running in N instances instead of 1, related to some filtering steps. But in general you should receive a similar picture.
However, if you deploy this as a pipeline, I would be good to test it once to have a direct comparison between N and 1 run.
Than a diff
to see where the differences are.
Thank you for the information. Yes, I have run DCC on five separate samples and then run it combined. I am trying to write my own scripts to combine the results but if was wondering if I can use "CombineCounts.py" directly from your scripts. Thank you.
-A
I didn't use that script in a while, but you should give it try.
Hi,
I have run DCC successfully with my paired-end stranded data but I noticed that some circRNAs are repeated one time as annotated with the host gene and one time as "not_annotated". Here is an example:
chr1 1223244 1223968 SDF4 2 - exon-exon transcript,gene,exon,CDS chr1 1223244 1223968 not_annotated 1 + intergenic-intergenic not_annotated
I thought may be the problem of not enabling the parameter "-ss". I included it but the result is still the same. First, how to use the "-ss" parameter for firststrand data?. Second, why I am getting this duplicated results with the same coordinates?.
Here is my command:
Thank you.
-Abdallah