Open ChrissiKalk97 opened 1 week ago
Hi @ChrissiKalk97, please rerun the system.file()
function as such:
system.file("extdata/pb_custom.gtf.gz", package = "factR2")
Thank you @fursham-h for your quick reply. I now ran:
gtf.file <- system.file("extdata/pb_custom.gtf.gz", package = "factR2") fobj <- createfactRObject(gtf.file, reference = "vM33") 🡆 Checking inputs 🡆 Checking factRObject 🡆 Adding custom transcriptome ℹ Importing from local directory Registered S3 method overwritten by 'data.table': method from print.data.table
🡆 Adding annotation ℹ Importing from URL 🡆 Adding genome sequence ℹ Using BSgenome object Warning: 4 seqlevel(s) inyour custom GTF
are not found inreference annotation
🡆 Matching chromosome names 🡆 Matching gene information Number of mismatched gene_ids found: 83520 --> Attempting to match ensembl gene_ids... --> No ensembl gene ids found in query ---> Attempting to match gene_ids by finding overlapping coordinates... ---> 65203 gene_id matched Total gene_ids corrected: 65203 Remaining number of mismatched gene_ids: 18317 🡆 Creating factRset objects ℹ Adding gene information ℹ Adding transcript information ℹ Adding alternative splicing information Warning: Expected 3 pieces. Additional pieces discarded in 9 rows [1, 2, 3, 4, 5, 6, 7, 8, 9]. Warning in .merge_two_Seqinfo_objects(x, y) : Each of the 2 combined objects has sequence levels not in the other:
- in 'x': MU069434.1, MU069435.1, GL456233.2, GL456370.1, JH584304.1
- in 'y': JH584304 Make sure to always combine/compare objects based on the same reference genome (use suppressWarnings() to suppress this warning). 🡆 Annotating novel transcripts 🡆 Annotating novel AS events Error in
dplyr::mutate()
: ℹ In argument:coord = paste0(seqnames, ":", start, "-", end)
. Caused by error inas.vector()
: ! cannot coerce type 'closure' to vector of type 'character' Backtrace:
- factR2::createfactRObject(gtf.file, reference = "vM33")
- base::paste0(seqnames, ":", start, "-", end)
- base::as.character.default(
<stndrdGn>
)
I think this is now the same problem as described in https://github.com/f-hamidlab/factR2/issues/7. You did not come to find a fix for this yet?
Best,
Christina
Hi @ChrissiKalk97
Thank you for pointing that out. We have fixed that issue in the commit https://github.com/f-hamidlab/factR2/commit/1c6ad2e321d5dda8512f65cd845285d131904643, as well as other bugs that we found along the way. Do test it out and let us know if it still crops up.
Thank you @fursham-h
It is now working for the example data and also for my custom transcriptome. I ran:
seqlevels(ref.gtf) <-unique(c(seqlevels(custom.gtf), seqlevels(ref.gtf))) extended_Ens_110.gtf <-c(ref.gtf, custom.gtf) custfactRobj <- createfactRObject(extended_Ens_110.gtf, "v44")
I get a warning to match chromosome names, but I guess this should be fine as the reference and the custom file have the same chromosome naming.
I still have some more questions on the usage and output of factR2:
I can create a factRobject and build the CDS as well as determine the NMD status. When I try to run the example of the vignette for getting the number of NMD sensitive and insensitive transcripts, I obtain the following error: `features(custfactRobj) %>%
show_more to TRUE to show more info
Error in group_by()
:
! Must group by variables found in .data
.
✖ Column nmd
is not found.
Run rlang::last_trace()
to see where the error occurred.`This is not an issue as I can simply count the occurrence of "yes" and "no" in the custfactRobj@sets$transcript@rowData$nmd
column.
What I would be interested in is to obtain the coordinates of the defined CDS. Is this somehow possible? The custfactRobj@sets$transcript@rowData$cds
column simply states "yes" or "no". Does "yes" mean that a CDS could be determined and "no" that it was not possible?
I tried to export a GTF of the factRobject for which the CDS was determined. But this only contained the CDS information of the reference assembly and not for the transcripts of the custom assembly.
As the 3' UTR length is reported I assume, that I could calculate the transcript length and substract the 3'UTR length to obtain the CDS end coordinate. Is it also possible to obtain the CDS start coordinate?
Sorry to bother you with so many questions...Thanks in advance :)
Hey. So there are several things here.
features(custfactRobj) %>% group_by(nmd) %>% tally()
will only work if you set activeSet()
to "transcript". You can use txs(custfactRobj)
instead.custfactRobj@transcriptome
. Let me take a look and fix this.
Hello,
I would like to run factR2 on a custom assembly where I ran in the issue described in https://github.com/f-hamidlab/factR2/issues/7. So I thought I try it on the example data you provided pb_custom.gtf.gz. I run the following code in R4.3.2:
Is this an issue with the R version? Does it work if I update it, or is it another problem?
Thanks in advance for the help!
Best regards,
Christina Kalk