Closed JC-therea closed 6 months ago
I Made this function that is safer, it is supposed to catch this error:
ORFik::makeTxdbFromGenome(gtf = ..., genome = ..., organism = ..., optimize = TRUE, pseudo_5UTRS_if_needed = 50) # 50nt pseudo UTR, do this once only!
txdb <- loadTxdb(paste0(gtf, ".db")) # Use this to load
Note this is also possible to create on download of new species with:
getGenomeAndAnnotation(..., optimize = TRUE, pseudo_5UTRS_if_needed = 50)
Let me know if it worked
Hello again,
I m having the same error:
> ORFik::makeTxdbFromGenome(gtf = gtf, genome = genome, organism = "Saccharomyces paradoxus", optimize = TRUE, pseudo_5UTRS_if_needed = 50) # 50nt pseudo UTR, do this once only!
Making txdb of GTF
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
---- Adding Pseudo 5' UTRs
- Total leaders: 0
- Leader length statistics:
Min. 1st Qu. Median Mean 3rd Qu. Max.
- Percentage of CDS' with leaders: 0%
- Adding pseudo
--------------------------
Txdb stored at: Outputs/QC/Annotation/Saccharomyces_paradoxus_CBS432.gtf.db
--------------------------
Optimizing annotation, saving to: Outputs/QC/Annotation/ORFik_optimized
Creating fst speedup file for transcript lengths, at location:
Outputs/QC/Annotation/ORFik_optimized/Saccharomyces_paradoxus_CBS432_2024-05-23175255+0200_txLengths.fst
fstcore package v0.9.14
(OpenMP detected, using 20 threads)
Creating rds speedup files for transcript regions
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
> txdb <- loadTxdb(paste0(gtf, ".db")) # Use this to load
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Attaching package: ‘AnnotationDbi’
The following object is masked from ‘package:dplyr’:
select
>
> ##############
> #txdb <- assignTSSByCage(gtf, cage = NULL, pseudoLength = 50)
> shifts <- suppressWarnings(detectRibosomeShifts(footprints, txdb, verbose = F , accepted.lengths = 25:35))
Error in mapSeqlevels(x_seqlevels, value, drop = FALSE) :
supplied seqname style "RefSeq" is not supported
In addition, I would like to ask you regarding the arguments of the function... How relevant is the argument organism for making work the function? I ask because I would like to do something generalistic without the need of including each time the organism.
If you want to check the files are here
Hm, will check it out, Organism is only required for gene symbol fetching, so if you do not need that then skip it :)
Downloaded your data, and found the error.
I have pushed a fix now, give it a go :)
Nice commit hahaha. Now it works like a charm.
Thank you very much!
Hi,
First of all, thank you for your tool. I developed an internal pipeline that uses ORFik to estimate the offset of ribosome profiling samples and normally it works perfectly.
However, I'm facing a problem with my non-canonical organism. Normally what I do is the following:
And normally works fine if I have UTRs in my annotation. However when I don't have UTRs as you probably know I get the following error:
Which is normal I get that I have to add pseudo-UTRs to make it work and I tried with some command that I saw in your tool:
txdb <- assignTSSByCage(gtf, cage = NULL, pseudoLength = 50)
However, that makes me an additional error which is the following:Checking the seqlevels from the bam file and the txdb I obtained the same chromosomes:
So Im a little bit lost and I don't know if its better to artificially create a gtf with "artificial" UTRs or if I should use another command.
Thanks in advance for your help
Carlos