Aufiero / circRNAprofiler

10 stars 3 forks source link

getBackSplicedJunctions() Function Issue #5

Closed adenotp closed 4 years ago

adenotp commented 4 years ago

Hi,

I have been using circRNAprofiler to perform downstream analysis on output data from CIRCExplorer2.

I have been able to work with the full pipeline as showed on Bioconductor, until the last week of June (2020). Yesterday (July 20th), while I was trying to work on some new data, an error occurred at the getBackSplicedJunctions() function step (following here is the error report):

Schermata 2020-07-21 alle 10 34 21 Schermata 2020-07-21 alle 10 34 37 Schermata 2020-07-21 alle 10 34 58

I have used the same script I have been using for the previous analysis, I just moved the old directory (the one with the old data from June I will call it OLD from now on) from the working directory (WD from now on) and initialized (initCircRNAprofiler(projectFolderName = "circRNAprofiler", detectionTools = "circexplorer2")) a new one (NEW from now on) in the same working directory.

To fix the issue during the past day I tried different things that I'll try to list here as precisely as possible:

  1. I tried to repeat the analysis on the old data (OLD), but in their new position:
    • setwd("old/dir/new/position");
    • check <- checkProjectFolder(), check returned 0;
    • gtf <- formatGTF("genes.gtf"), head showed that gtf file was loaded and fine;
    • backSplicedJunctions <- getBackSplicedJunctions("circexplorer2/"), here the error occurs again.
  2. Canceled the new directory (NEW) and moved back the old data directory in the original working directory (WD), the error continued to occur;
  3. Initialized circRNAprofiler in other directories and downloaded again the data (CIRCExplorer2 output and GTF file) from the server, the error occurred again;
  4. Updated my system (Platform: x86_64-apple-darwin17.0 (64-bit), Running under: macOS Catalina 10.15.6 and new Version of XCode) and restarted, I did the other trials both before and after the updates and the restarting of my system, but this didn't fix the issue;
  5. Since I read another user had an issue with BSJ function and it was related to the version of dplyr I tried to downgrade the package and tried the whole procedure with different versions of dplyr from 0.8.3 to 1.0.0, then I did the same with vctrs package, I also removed all these packages, installed them again and tried again, finally I updated my version of R from 3.6.3 to the latest 4.0.2, reinstalled all the packages (not just these three) and tried again, but the issue still occured;
  6. Resigned to the fact that maybe my PC is the issue, I tried to perform the analysis on the server where I keep my data and perform the heavy duty analysis (Platform: x86_64-pc-linux-gnu (64-bit), Running under: CentOS Linux 7 (Core)), this time using the old data (the same that on my system I called OLD before), the whole pipeline (same one I used on my PC) worked flawlessly.

I'm sorry for the long post, I just tried to be as precise as possible; if you think it could help you have a clearer understanding of the issue I also have the session information (sessionInfo()) of both the server run (the one that worked) and my latest PC run (which didn't work) of the script.

Thank you in advance!

Best Regards, Enrico

Aufiero commented 4 years ago

Hi Enrico,

The package builds correctly on my PC and also on Bioconductor servers. As you said there was an update of the dplry package that caused problem but I fixed it and now the latest release builds correctly. I think it is something with your PC, it might be that something messed up. I suggest that you delete all the packages in 4.0.2 library and run:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("circRNAprofiler")

You always have to use BiocManager::install to install packages, it takes care of all the dependencies installing the correct versions of the packages based on the Bioconductor version.

Furthemore, I see that you run the following command: backSplicedJunctions <- getBackSplicedJunctions("circexplorer2/")

The function getBackSplicedJunctions() takes in input the gtf object created with formatGTF() and the path to experiment.txt (see vignettes). So in your case, I see that you formatted the genes.gtf file and created the gtf object and I assume that you put the experiment.txt in the working directory circRNAprofiler. If you have done so you should run:

getBackSplicedJunctions(gtf) but you wrote "circexplorer2/" instead of gtf.

Let me know if you solved the problem.

Best, Simona

adenotp commented 4 years ago

Hi Simona,

First of all, thank you for your prompt response. I tried, as you suggested, to remove all the user installed packages (I kept the base and recommended ones) and run the BiocManager install command, but still when I try to run the script (command by command) the getBackSplicedJunctions() function returns the same error. I also tried to use the gtf object as the function argument, but the result doesn't change, when I run the command on the server it runs flawlessly with both gtf and "circexplorer2/".

I welcome any new idea and if you need me to produce any kind of information for you please ask me.

Again, thank you very much.

Best, Enrico

Aufiero commented 4 years ago

Hi Enrico, can you run sessionInfo() and send it to me?

adenotp commented 4 years ago

Sure, here is the session info on my Mac: Schermata 2020-07-21 alle 11 33 46 Schermata 2020-07-21 alle 11 34 03

And on the server:

Schermata 2020-07-21 alle 11 33 09 Schermata 2020-07-21 alle 11 33 24

Thanks, -E

Aufiero commented 4 years ago

Hi Enrico,

I can see that you install the devel version of circRNAprofiler (v 1.3.7), there should not be any problem with that since it also builds correctly on my computer and on Bioconductor servers.

On your server, I see that there is even a previous version of circRNAprofiler that is 1.0.0 and you should not use that one but the release 1.2.1 or the devel version 1.3.7.

I can not reproduce the error that you get, so what I suggest is that you unistall R, maybe even R studio, delete the whole folder 4.0 that usually is in e.g. your user/enrico/R/win-library (it will be created again when you install R ). Then reinstall R (>= 4.0.0) and R studio, then choose and run 1) OR 2):

1) This is for the current release 1.2.1. This is recommended since it a stable version

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("circRNAprofiler")

2) this is for the devel version ( there were some improvements):

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc devel
BiocManager::install(version='devel')

BiocManager::install("circRNAprofiler")

Once you have done that and your project folder is correctly set up, run: gtf <- formatGTF("genes.gtf") backSplicedJunctions <- getBackSplicedJunctions(gtf)

Let me know how it goes.

S

adenotp commented 4 years ago

Hi Simona,

I tried what you suggested, I uninstalled completely both R and RStudio from my Mac and once reinstalled everything I ran just your 1) command, but again I incurred in the same problem, then I did the same thing with the devel version, but again, same error. So I decided to install R and RStudio on my personal Windows Laptop (which never had R installed before) and curiously it had the same issue:

Error: Can't combine '..1$gene' <character> and '..2$gene' <integer>.

I'm posting here the first columns of my circexplorer2 output file (_circularRNAknown.txt) to let you see if there could be an error in the input I use, could it be possible that with one of the updates something changed with the circexplorer2 part of the pipeline, do you per chance have any toy circexplorer2 data I could use to test the error?

Schermata 2020-07-22 alle 13 10 28

Column V18 is cut: it's the coordinates of the flanking intron (data format: chr1:1234-1235|chr1:1236-1237, before or after | you could also have None).

Thank you very much.

Best, -E

Aufiero commented 4 years ago

Hi Enrico,

it seems ok. Could you attach a sample of one circexplorer2 output file (circularRNA_known.txt) so that I do a test directly with that? Which annotation did you use?

S

adenotp commented 4 years ago

Sure, here's one of the samples:

CAL851_24H_A1_S16circularRNA_known.txt

For what concerns the annotation I used UCSC hg38 Annotation files.

Thanks, -E

Aufiero commented 4 years ago

ok thanks Enrico.

BTW from your sessionInfo() on your Mac, that you posted last time, I could not find the S4Vectors package, maybe I am missing it, could you check if it is installed? if it is not installed, use BiocManager to install it.

In the meantime, I'll check your file.

S

Aufiero commented 4 years ago

Hi Enrico,

I got the same error, there is something with the file. I'll see how I can fix asap and I'll let you know.

S

adenotp commented 4 years ago

Hi Simona,

Great, really thank you very much!

Best, Enrico

Aufiero commented 4 years ago

Another thing, circRNAprofiler has been tested on the circExplorer2 v2.3.4 output file (circularRNA_full.txt) file. The file that you attached ends with ...circularRNA_known.txt? Do you also have a file that ends with circularRNA_full.txt?

S

Aufiero commented 4 years ago

Hi Enrico,

just an update, the problem as I said is about the file, in particular in the file that I used for testing, the gene name (HGNC symbol) is reported in V15 while in your file you have a number so for this reason you get the error since the function can not find the gene name (HGNC symbol).

Could you run and show me this:

gtf <- formatGTF("genes.gtf")
head(gtf)

it might be that the GTF file that you used also for circRNA detection does not have the gene name in HGNC symbol, so you do not have it in the circExplorer2 output file.

Check that and let me know.

S

adenotp commented 4 years ago

Hi Simona,

For what concerns your first message I use circExplorer2 v2.3.5, the annotate command returns as an output ...circularRNA_known.txt while the denovo one returns a circularRNA_full.txt.

Now for the second: I think that V15 is a number because it reports the Entrez ID, but correct me if I'm wrong. Here is head(gtf):

Schermata 2020-07-22 alle 18 26 32

Thank you very much!

-E

Aufiero commented 4 years ago

Hi Enrico,

the GTF file is correctly formatted and it has the gene name, so it is correct. Did you use a different annotation file in circExplorer2? The problem is your file ...circularRNA_known.txt, the gene name is missing. The column V15 should contain the geneName field as reported also by circExplorer2 documentation.

Try to see why you do not have it.

S

adenotp commented 4 years ago

Hi Simona,

I checked and it seems that some of my data were processed with a different annotation file due to a server modules update that occurred during the last month (which I had absolutely no clue about), I'm trying to fix the issue right now.

Thank you very much for your time and attention!

Best, Enrico

Aufiero commented 4 years ago

Hi Enrico,

No problem. Hope that you will be able to use circRNAprofiler after fixing that.

I'll now close this issue.

Best, Simona