Aufiero / circRNAprofiler

10 stars 3 forks source link

function error for BSJs #2

Closed marvel479 closed 4 years ago

marvel479 commented 4 years ago

Hi! I have been trying to do some downstream analysis of my circRNAs, and reached as far as module 12 with it. This was as on Friday, 28th May. Now when I am using the same R script as the package suggested and that was working before, it doesn't work anymore.

> backSplicedJunctions <- getBackSplicedJunctions(gen, pathToExperiment = NULL) image

Here I get: This is what my data looks like after reading the gtf file. `> head(gen)

chrom start end width strand type gene_name 1 chr1 3073253 3074322 1070 + exon 4933401J01Rik 2 chr1 3102016 3102125 110 + exon Gm26206 3 chr1 3213609 3216344 2736 - exon Xkr4 4 chr1 3205901 3207317 1417 - exon Xkr4 5 chr1 3213439 3215632 2194 - exon Xkr4 6 chr1 3206523 3207317 795 - exon Xkr4 transcript_id exon_number 1 ENSMUST00000193812.1 1 2 ENSMUST00000082908.1 1 3 ENSMUST00000162897.1 1 4 ENSMUST00000162897.1 2 5 ENSMUST00000159265.1 1 6 ENSMUST00000159265.1 2`

Aufiero commented 4 years ago

Hi, from what you reported I think it is not a problem of the GTF file. The GTF file is read correctly. The function getBackSpliceJunctions is trying to read the file containing the circRNA prediction results (circRNA_X.txt, see vignettes). Check if you correctly set your working directory (folder projectFolderName) and that the subfolders (mapsplice or nclscan, knife, circexplorer2, or uroborus or circmarker or other) contain the circRNA_X.txt file. Check also that experiment.txt reports the correct file name. Let me know if you solved the problem.

S

marvel479 commented 4 years ago

Hi, I am using the same folder and directory location as of Friday. Even after using check, it also works, until I go and use the function backsplicejunctions. I never saw this error previously so I was wondering if there was any update over the weekend that could have caused this issue

On Tue, Jun 2, 2020 at 12:41 AM Aufiero notifications@github.com wrote:

Hi, from what you reported I think it is not a problem of the GTF file. The GTF file is read correctly. The function getBackSpliceJunctions is trying to read the file containing the circRNA prediction results (circRNA_X.txt, see vignettes). Check if you correctly set your working directory (folder projectFolderName) and that the subfolders (mapsplice or nclscan, knife, circexplorer2, or uroborus or circmarker or other) contain the circRNA_X.txt file. Check also that experiment.txt reports the correct file name. Let me know if you solved the problem.

S

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-637353679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUCI3PZWDYZRLRQTK7TRUSULDANCNFSM4NP6QELQ .

-- Regards, Aayushi

Aufiero commented 4 years ago

Hi Aayushi,

it might be that there was an update from the new version of dplyr package that is causing problems.

I'll look into it the very next days and if there are any changes I'll push those and you can install the latest version of circRNAprofiler.

marvel479 commented 4 years ago

Great. Thanks for letting me know!

On Tue, Jun 2, 2020 at 11:49 AM Aufiero notifications@github.com wrote:

Hi Aayushi,

it might be that there was an update from the dplyr package that is causing problems.

I'll look into it the very next days and if there are any changes I'll push those and you can install the latest version of circRNAprofiler.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-637739954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUDI7XATBVKR6T5W7K3RUVCVNANCNFSM4NP6QELQ .

-- Regards, Aayushi

Aufiero commented 4 years ago

Hi Aayushi,

I fixed the bugs. Now the release version of circRNAprofiler builds correctly on Bioconductor. You can install the release version of circRNAprofiler using:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("circRNAprofiler")

For the documentation see: browseVignettes("circRNAprofiler")

Let me know if you can proceed with the analysis.

Best, Simona

marvel479 commented 4 years ago

Hi Simona, I will check if I can proceed with the analysis now and let you know. Thanks for fixing the bugs so soon.

On Sun, Jun 7, 2020 at 2:05 AM Aufiero notifications@github.com wrote:

Hi Aayushi,

I fixed the bugs. Now the release version of circRNAprofiler builds correctly on Bioconductor. You can install the release version of circRNAprofiler using:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("circRNAprofiler")

For the documentation see: browseVignettes("circRNAprofiler")

Let me know if you can proceed with the analysis.

Best, Simona

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-640181190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUBNY33RZYORUFC6F43RVNJ5PANCNFSM4NP6QELQ .

-- Regards, Aayushi

marvel479 commented 4 years ago

Hi Simona, Looks like the BSJ function is working but I have error while using the motif functions.

> targetsBTS_gr <-
+   getSeqsFromGRs(
+     annotatedBackgroundCircs,
+     genome,
+     lIntron = 200,
+     lExon = 9,
+     type = "ie")
Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i], start[i],  : 
  sequence chrMT not found

I had no circRNA candidates from chrMT, as I filtered those out, and I still get this error when I delete chrM entries from the gtf data frame manually. It looks like the problem is with genome, I am using mm10, any ideas?

Aufiero commented 4 years ago

Hi Aayushi,

thanks for letting me know. I'll do a test and I'll let you know.

S

Aufiero commented 4 years ago

Did you run this to get the genome mm10, right?

if (!requireNamespace("BSgenome.Mmusculus.UCSC.mm10", quietly = TRUE)){
  BiocManager::install("BSgenome.Mmusculus.UCSC.mm10")
}
genome <- BSgenome::getBSgenome("BSgenome.Mmusculus.UCSC.mm10")

And which genome annotation (gtf) did you use? gencode, UCSC, NCBI or Ensemble?

marvel479 commented 4 years ago

Yes. Seems like the complication may stem from the fact that the ''genome$chrMT" is not found, and instead it has "genome$chrM". Can that be something that causes the issue?

On Mon, Jun 8, 2020 at 12:22 AM Aufiero notifications@github.com wrote:

Did you run this to get the genome mm10, right?

if (!requireNamespace("BSgenome.Mmusculus.UCSC.mm10", quietly = TRUE)){ BiocManager::install("BSgenome.Mmusculus.UCSC.mm10") } genome <- BSgenome::getBSgenome("BSgenome.Mmusculus.UCSC.mm10")

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-640417300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUH535HEXMFKJTOHH5LRVSGSZANCNFSM4NP6QELQ .

-- Regards, Aayushi

Aufiero commented 4 years ago

It might be, and if you remove the circRNAs arising from chrMT, it should work. So in your case before running getSeqsFromGRs(), you should fix the chrom:

annotatedBackgroundCircsFixed <- annotatedBackgroundCircs %>%
dplyr::mutate(chrom = ifelse(.data$chrom == 'chrMT', 'chrM', .data$chrom))

BTW I will now introduce this check for chrMT in the code of circRNAprofiler. I'll notify you when it's done.

S

marvel479 commented 4 years ago

yep, that was it. It's working now. Thanks, Simona!!

On Mon, Jun 8, 2020 at 6:12 AM Aufiero notifications@github.com wrote:

It might be, and if you remove the circRNAs arising from chrMT, it should work. So in your case bfore running getSeqsFromGRs(), you should fix the chrom:

annotatedBackgroundCircsFixed <- annotatedBackgroundCircs %>% dplyr::mutate(chrom = ifelse(.data$chrom == 'chrMT', 'chrM', .data$chrom))

BTW I will now introduce this check for chrMT in the code of circRNAprofiler. I'll notify you when it's done.

S

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-640594265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUDDUV3JKS6UBYC5ET3RVTPVLANCNFSM4NP6QELQ .

-- Regards, Aayushi

marvel479 commented 4 years ago

Hey Simone, Sorry about another bother. I have a new problem now, this time with the getMotif function.

`> motifsFTS_gr <-
   getMotifs(targetsFTS_gr,
            width = 6,
            database = 'ATtRACT',
            species = "Mmusculus",
            rbp = TRUE,
              reverse = FALSE)

I am getting this:

Error in if (targetsToAnalyze$type[1] == "circ") { : 
  missing value where TRUE/FALSE needed`

I do not believe it is something that I was seeing before, and as I understand it has something to with a default not getting read correctly, any idea what can be changed?

Aufiero commented 4 years ago

No problem. Can you run this command and show me the output: head(targetsFTS_gr)

marvel479 commented 4 years ago

This is what I get `> head(targetsFTS_gr) $upGR id gene transcript strand chrom startGR endGR length 1 Syt14:-:chr1:192986882:192980345 Syt14 ENSMUST00000215093.1 - chr1 192986873 192987082 210 seq 1 AUACUUAUUUAGAAACGUUUUGAAAAAUUUUGAAUGAAUGCUGUUGAAUAUAUUGAAAAUAUAUCUUUACACUUUGUGGGGUUUCUUACAAACUAUUUUAAAUUAUGACAUUUUAAAAGUAGUUGACAUUAGAUAGCAAACUGUCUAGCGUAUACUGGGAAAUUCCUUCUUAUUCUCAGCCUUGACUUUCUUUUCUCUAGUCUCUCCAGA type 1 ie

$downGR id gene transcript strand chrom startGR endGR length 1 Syt14:-:chr1:192986882:192980345 Syt14 ENSMUST00000215093.1 - chr1 192980146 192980355 210 seq 1 AUGGAGGACGGUAAGAACGCUAUUAUUUUAGAUGAUUUAUACCUAAAAUUCUAGGUAUGUAUGGUUGCACACUGACAGAGAGAGUGCAGAAUUCUGUUUCUAAGUGCAGUCCAAAUAAAAAGUUUGAGCUUGUGAUCAGCUCCAAAAUACCCCAAAUGGAAAGACAAAGUGUUGGACUCAGUGUGAUACUGGGAUUCUCACUCACAGUCU type 1 ie`

Aufiero commented 4 years ago

It's seems ok, you should not get any error since your targetsFTS_gr$upGR$type[1] or targetsFTS_gr$downGR$type[1] is equals to ie.

I did a test and if you have targetsFTS_gr$upGR$type[1] or targetsFTS_gr$downGR$type[1] equals to NA you get that error message, but there should be an NA value in there. Could you maybe rerun targetsFTS_gr<- getSeqsFromGRs(...) and check that again?

marvel479 commented 4 years ago

Hi Simone, Yes, it's working now. It seems to be a result of using some genes/genomic coordinates, whose sequences are not getting fetched as expected, but the function works otherwise. Thanks for getting back to me, love the package, it is really well-thought out and useful.

On Wed, Jun 10, 2020 at 4:15 AM Aufiero notifications@github.com wrote:

It's seems ok, you should not get any error since your targetsFTS_gr$upGR$type[1] or targetsFTS_gr$downGR$type[1] is equals to ie.

I did a test and if you have targetsFTS_gr$upGR$type[1] or targetsFTS_gr$downGR$type[1] equals to NA you get that error message, but there should be an NA value in there. Could you maybe rerun targetsFTS_gr<- getSeqsFromGRs(...) and check that again?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Aufiero/circRNAprofiler/issues/2#issuecomment-641933693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO5TFUGCW7S752AA6HWCW3DRV5TLVANCNFSM4NP6QELQ .

-- Regards, Aayushi

Aufiero commented 4 years ago

Hi Aayushi, thanks, and thanks to you that reported the issues so that I can improve the code.

About the chrM/chrMT bugs, in the development version of circRNAprofiler (not in the release) it should be solved.

You can install the devel version of circRNAprofiler with:

BiocManager::install(version='devel')
BiocManager::install("circRNAprofiler")

I'll now close the issue that you opened on Github.