Aufiero / circRNAprofiler

10 stars 3 forks source link

AnnotateBSJs producing NAs in the transcript columns for certain circRNAs #11

Closed prisca399 closed 1 year ago

prisca399 commented 1 year ago

Hi @Aufiero,

Thank you for creating and sharing such a great tool with the scientific community! I am currently using it to extract BSJ sequences for specific circRNAs. However, I am stuck at Module 7 when trying to use a custom transcripts.txt file to annotate circRNA structure. When I do not have a transcripts.txt file in the appropriate directory, I get a warning that transcripts.txt is empty/does not exist. In this case, circRNA profiler will then choose the transcript itself for each circRNA. When it does this, it will print NAs in the transcript columns for some circRNAs, and this lack of information impedes some of my downstream work. When I do provide a custom transcripts.txt file indicating the transcript I want to use for each circRNA, I do not get the same warning (suggesting that the custom file is recognized), but there are still NAs for the same circRNAs. I assume the issue is that, for some reason, circRNAprofiler is not able to "match" any of the transcripts annotated to a certain gene to that circRNA. I am using the same GTF that I used for the mapping procedure so I do not think that is the issue. You wrote somewhere that the NAs could be an issue related to the circRNA identification pipeline. May you elaborate more on this? I am using the output of CIRCexplorer2, which is an annotation-dependent circRNA identification pipeline that already matches each circRNA to a transcript id and lists the exons within that circRNA . So I don't think there should be any ambiguity around such annotation, but apparently there might be? I am confused as to what else to try and would appreciate any advice you have!

Certainly let me know if I need to share any files that would be helpful.

simoauf commented 1 year ago

Hi, prisca. Can you check if the back-spliced junctions match any exon coordinates reported in the transcript? Is the transcript id is present in the annotation file? If so, does the gene name of the transcript match the gene name of the circRNA?

In case you can not solve the problem please send me the CIRCexplorer2 output file with only the circRNA of interest, and the name of the annotation file you used and I will do some tests.

prisca399 commented 1 year ago

Thanks for the quick reply! I checked for one of the circRNAs that is giving NAs and the answer is yes for all three questions. I have attached an example of my circRNAprofile annotation table vs my GTF for one of the problematic circRNAs, circCTBP1.

circRNAprofiler table image

GTF: image

The only discrepancy I notice is that the start coordinate for a circRNA listed the in CE2 output is always n-1 the start coordinate of its upstream backspliced exon as annotated in the GTF. But I think this is true for circRNAs that circRNAprofiler is able to annotate, so it unlikely the reason for the failed annotation observed for certain circRNAs. I will send the requested files to your email address for you to review.

simoauf commented 1 year ago

Hi Prisca, I had a look at the issue, and the problem is with this coord.: 1225359 that does not match with any exon coordinates.

The script annotateBSJs uses the back-spliced junction coordinate to detect the matching exon, since the coordinates of the detected back-spliced junctions might not exactly correspond to annotated exonic coordinates, the match is performed considering the back-spliced junction coordinates minus the last number. In this case, the coordinate is 1225359, and annotateBSJs uses 122535 to detect the exon. By doing this, the algorithm is still not able to find a matching exon because the coordinate of the exon is 1225360 hence 122535 does not match.

To fix the problem, please run mergeBSJunctions() with fixBSJsWithGTF = TRUE. by setting the arg fixBSJsWithGTF = TRUE, you are fixing the back-spliced junction coordinates to exactly match the exon coordinates in the GTF file.

Best, S

prisca399 commented 1 year ago

Thank you, this solved my issue! You can close this case.