Closed Shiywa closed 8 months ago
Will take a look later today, most likely just another package that changed API behavior.
fine, thank you very much for your effort. you are a surper man.
So, I tested on dummy data. It works for me, so lets test 2 things:
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
find_uORFome(output_dir, organism = "Homo sapiens",
df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
This should only fail at final step at:
Error in uORFomePipe:::makeTrainingData(df.rfp, max.artificial.length = max.artificial.length, :
Training on less than 30 valid 'active' CDS is not allowed!
Simply because the ORFik dummy set it too small, but you will see it will go past the step you were at.
If it still fails at "Creating merged leader", we know we have different software versions.
We know the final message you got before error was: "Creating merged leader". So it means most likely error happens in uORFomePipe:::allLeadersSpanningLeader.
So do this now:
debug(uORFomePipe:::allLeadersSpanningLeader)
find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
To debug: press n + enter to move next line press s + enter to step into function
Figure out which specific line it failed at and send that to me :)
well, the it still failed at Creating merged leader
, like :
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
find_uORFome(output_dir, organism = "Homo sapiens",
+ df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+ startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders
--------------------------------------
Registered organism is: Homo sapiens
Project location: /home/bio/biosoft/riboseq_data/projects/dummy_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant WT
2 2
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
Searching for candidate uORFs
starting to filter out ourfs...
finished filtering ourfs
starting to filter out ourfs...
finished filtering ourfs
Creating uORF ID's
Creating GRanges objects
Creating merged leader
错误: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.
Hm, just pushed a new version, could you do:
devtools::install_github("Roleren/uORFomePipe") # Then restart R and try dummy data again
well, after re-installl uORFomePipe
, the version is 0.2.2. Then I restart the R but it still failed like before:
> df <- ORFik.template.experiment()
> df.rfp <- df[df$libtype == "RFP",]
> df.rna <- df[df$libtype == "RNA",]
> output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
> find_uORFome(output_dir, organism = "Homo sapiens",
+ df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+ startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders
--------------------------------------
Registered organism is: Homo sapiens
Project location: /home/bio/biosoft/riboseq_data/projects/dummy_uORFome/
Run mode: uORF
--------------------------------------
载入需要的程辑包:GenomicFeatures
载入需要的程辑包:AnnotationDbi
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant WT
2 2
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
Creating GRanges objects
Creating merged leader
错误: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.
my ORFik
is v1.23.5.
actually I suspect that function createCatalogueDB
's dependences need update or degradation ?
See this: https://github.com/HenrikBengtsson/matrixStats/issues/242
Looks like some change matrixStats, I tried to update to newest matrixStats for me it is 1.2.0
It works for me in that version, I supsect for you the line that fail is:
maxWidths <- rowMaxs(widths) # in the function allLeadersSpanningLeader
Could you run with debug as I sent above in earlier comment to confirm which line fails, if it is that line, updating matrixStats should do it.
Well, actually my matrixStats
is 1.2.0 now. Based on (your suggestions)[https://github.com/HenrikBengtsson/matrixStats/issues/242], I guess the error is due to my old R version. my R is 4.2.0, but the new matrixStats
has make useNames = NA
derunct. mybe I need a old matrixStats
< 1.1.0 ?
I don't understand why it works for me then, I have R 4.3.0, it should not matter.
Can you first do this:
rowMaxs(matrix(c(1,2,3))) # Does this work ?
Secondly verify it is matrixStats that fails, by running debug above.
If you can verify it, then try to downgrade matrixStats to 1.0.0
well, thanks for your idea. i have solved it. actually it is due to a biomanager packages MatrixGenerics
. this packages dependent on matrixStats
. In the package MatrixGenerics
, a same function rowMaxs
existed. so when we used rowMaxs
, it is conflict. my old MatrixGenerics
is 1.10.0 in R 4.2.0. when I updated it to 1.14.0 manually, it worked.
Then, the new error come. by testing that dummy data, I got the error as you said before :
Error in uORFomePipe:::makeTrainingData(df.rfp, max.artificial.length = max.artificial.length, :
Training on less than 30 valid 'active' CDS is not allowed!
when I used the test data PRJNA591214
, I got :
> find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
+ df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+ startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders
--------------------------------------
Registered organism is: Homo sapiens
Project location: /home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant WT
2 2
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
GRObjects already exist, skipping remake of them
UORFAtlas already exist, skipping remake of them
--------------------------------------
Making training data from CDS and trailers (3' UTRs):
[1] "Ribo-seq features exists in DB for: cds delete and run again if you want new"
[1] "Ribo-seq features exists in DB for: three delete and run again if you want new"
Training data on CDS and 3' UTRs finished
--------------------------------------
[1] "sequence features exist, delete and run again if you want remake!"
--------------------------------------
Finding Ribo-seq features for uORFs
/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_Mutant_rep2.ofst
/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_WT_rep1.ofst
/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/ofst/RFP_Mutant_rep1.ofst
/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_WT_rep2.ofst
错误: BiocParallel errors
4 remote errors, element index: 1, 2, 3, 4
0 unevaluated and other errors
first remote error:
Error in `[.data.table`(scores, , `:=`(disengagementScores, disengagementScore(grl, : 矢量分配的种类/长度(closure/632205)不对
is it due to BiocParallel
?
Ok, so we note that they have a bug in MatrixGenerics, I will do what I can to enforce the higher version, will push a fix shortly.
Ah, this is a bit of a scary thing, you still have not flushed all the variables (as by default outputs are stored to .GlobalEnv).
Simply do:
rm(list=ls()) # Now rerun find_uORFome :)
after rm(list = ls()), the error change:
> find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
+ df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+ startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders
--------------------------------------
Registered organism is: Homo sapiens
Project location: /home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
CO WT
3 3
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
GRObjects already exist, skipping remake of them
UORFAtlas already exist, skipping remake of them
--------------------------------------
Making training data from CDS and trailers (3' UTRs):
[1] "Ribo-seq features exists in DB for: cds delete and run again if you want new"
[1] "Ribo-seq features exists in DB for: three delete and run again if you want new"
Training data on CDS and 3' UTRs finished
--------------------------------------
[1] "sequence features exist, delete and run again if you want remake!"
--------------------------------------
Finding Ribo-seq features for uORFs
/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_CO1_Aligned.sortedByCoord.out_pshifted.ofst
/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_WT2_Aligned.sortedByCoord.out_pshifted.ofst
/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_CO2_Aligned.sortedByCoord.out_pshifted.ofst
/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_WT3_Aligned.sortedByCoord.out_pshifted.ofst
Stop worker failed with the error: 环境下赋值的参数不对
错误: BiocParallel errors
2 remote errors, element index: 1, 2
4 unevaluated and other errors
first remote error:
Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand): Seqlengths of x contains NA values!
此外: Warning message:
call dbDisconnect() when finished working with a connection
Im running the pipeline for alexai data now, to see if I can replicate this.
I can verify the new bug, will push a fix tomorrow morning.
Fixed and pushed, it was a new seqlevels bug for uorfs, related to new GenomicFeatures.
Let me know that it works :)
emmm, should I reinstall uORFomePipe
? or install a new version of GenomicFeatures
? my GenomicFeatures
is 1.50.4.
devtools::install_github("Roleren/uORFomePipe") # then restart R
waooooo, all of jobs work successfually. thank you very much for your help. the issue is over.
Thank you very much for your help again, but I have two concerns regarding the use of this software now, and I hope to get your advice.
First, I may encounter the need to customize my own reference genome and gtf file for analysis. It seems that the countTable_regions process relies on the gtf.db file for quantification. I tried to use the GenomeFeatures package to build the txdb file today, but it failed. Can you provide some advice on this?
Secondly, as a beginner, I still cannot fully understand the meaning of all the results from this pipeline. I have looked at your BMC Bioinformatics article, but I’m still not clear. Do you have any recommended articles to help me understand all the results?
sincerely !
# To load predicted uORFs as GRangesList in R:
output_dir <- "/media/roler/S/data/Bio_data/projects/Alexaki_uORFome2/" # The output dir for find_uORFome
uorfs_wt <- readRDS(file.path(output_dir, "results/candidate_uORFs/_WT.uorf.rds")) # The GRangesList
uorfs_wt_pred <- readRDS(file.path(output_dir, "prediction_model/prediction_WT.rds")) # The prediction state per uorf
uorfs_wt_pred_subset <- uorfs[uorfs_wt_pred$predict == 1] # Subset the GRL
# To load features to understand why they were predicted TRUE
uorfs_wt_features <- readRDS(file.path(output_dir, "features/PredictionData_WT.rds")) # The data.table of features
uorfs_wt_features_pred <- uorfs_wt_features[uorfs_wt_pred$predict == 1,] # Feature subset for predicted
That should be enough to start, if you have more questions I suggest you open a new issue with a specific new issue or question :)
hello again, thanks for your help in
ORFik
, I met another problem in the final step of ...uORF analysis... .When I run :
I got the error
after checking, I found it was due to the function
createCatalogueDB
.the version of
uORFomePipe
is 0.2.1.do you have any suggestions ?
sincerely !