Roleren / uORFomePipe

MIT License
3 stars 0 forks source link

error for find_uORFome #2

Closed Shiywa closed 8 months ago

Shiywa commented 9 months ago

hello again, thanks for your help in ORFik, I met another problem in the final step of ...uORF analysis... .

When I run :

find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
             df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
             startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())

I got the error

useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.

after checking, I found it was due to the function createCatalogueDB.

createCatalogueDB(df.cage)
uniqueIDs already exist, skipping remake of them
Creating GRanges objects
Creating merged leader
错误: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.

the version of uORFomePipe is 0.2.1.

do you have any suggestions ?

sincerely !

Roleren commented 9 months ago

Will take a look later today, most likely just another package that changed API behavior.

Shiywa commented 9 months ago

fine, thank you very much for your effort. you are a surper man.

Roleren commented 9 months ago

So, I tested on dummy data. It works for me, so lets test 2 things:

  1. Test that dummy data runs for you:
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
find_uORFome(output_dir, organism = "Homo sapiens",
             df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
             startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())

This should only fail at final step at:

Error in uORFomePipe:::makeTrainingData(df.rfp, max.artificial.length = max.artificial.length, :
Training on less than 30 valid 'active' CDS is not allowed!

Simply because the ORFik dummy set it too small, but you will see it will go past the step you were at.

If it still fails at "Creating merged leader", we know we have different software versions.

  1. If this works for you lets debug:

We know the final message you got before error was: "Creating merged leader". So it means most likely error happens in uORFomePipe:::allLeadersSpanningLeader.

So do this now:


debug(uORFomePipe:::allLeadersSpanningLeader)
find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
             df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
             startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())

To debug: press n + enter to move next line press s + enter to step into function

Figure out which specific line it failed at and send that to me :)

Shiywa commented 9 months ago

well, the it still failed at Creating merged leader, like :

df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
 find_uORFome(output_dir, organism = "Homo sapiens",
+              df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+              startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders

--------------------------------------
Registered organism is: Homo sapiens
Project location:  /home/bio/biosoft/riboseq_data/projects/dummy_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant     WT 
     2      2 
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
Searching for candidate uORFs
starting to filter out ourfs...
finished filtering ourfs

starting to filter out ourfs...
finished filtering ourfs

Creating uORF ID's
Creating GRanges objects
Creating merged leader
错误: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.
Roleren commented 9 months ago

Hm, just pushed a new version, could you do: devtools::install_github("Roleren/uORFomePipe") # Then restart R and try dummy data again

Shiywa commented 9 months ago

well, after re-installl uORFomePipe, the version is 0.2.2. Then I restart the R but it still failed like before:

> df <- ORFik.template.experiment()
> df.rfp <- df[df$libtype == "RFP",]
> df.rna <- df[df$libtype == "RNA",]
> output_dir <- "/home/bio/biosoft/riboseq_data/projects/dummy_uORFome/" # This works for you I guess ?
> find_uORFome(output_dir, organism = "Homo sapiens",
+              df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+              startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders

--------------------------------------
Registered organism is: Homo sapiens
Project location:  /home/bio/biosoft/riboseq_data/projects/dummy_uORFome/
Run mode: uORF
--------------------------------------
载入需要的程辑包:GenomicFeatures
载入需要的程辑包:AnnotationDbi
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant     WT 
     2      2 
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
Creating GRanges objects
Creating merged leader
错误: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.

my ORFik is v1.23.5. actually I suspect that function createCatalogueDB's dependences need update or degradation ?

Roleren commented 9 months ago

See this: https://github.com/HenrikBengtsson/matrixStats/issues/242

Looks like some change matrixStats, I tried to update to newest matrixStats for me it is 1.2.0

It works for me in that version, I supsect for you the line that fail is:

maxWidths <- rowMaxs(widths) # in the function allLeadersSpanningLeader

Could you run with debug as I sent above in earlier comment to confirm which line fails, if it is that line, updating matrixStats should do it.

Shiywa commented 9 months ago

Well, actually my matrixStats is 1.2.0 now. Based on (your suggestions)[https://github.com/HenrikBengtsson/matrixStats/issues/242], I guess the error is due to my old R version. my R is 4.2.0, but the new matrixStats has make useNames = NA derunct. mybe I need a old matrixStats < 1.1.0 ?

Roleren commented 9 months ago

I don't understand why it works for me then, I have R 4.3.0, it should not matter.

Can you first do this: rowMaxs(matrix(c(1,2,3))) # Does this work ?

Secondly verify it is matrixStats that fails, by running debug above.

If you can verify it, then try to downgrade matrixStats to 1.0.0

Shiywa commented 9 months ago

well, thanks for your idea. i have solved it. actually it is due to a biomanager packages MatrixGenerics. this packages dependent on matrixStats. In the package MatrixGenerics, a same function rowMaxs existed. so when we used rowMaxs, it is conflict. my old MatrixGenerics is 1.10.0 in R 4.2.0. when I updated it to 1.14.0 manually, it worked.

Shiywa commented 9 months ago

Then, the new error come. by testing that dummy data, I got the error as you said before :

Error in uORFomePipe:::makeTrainingData(df.rfp, max.artificial.length = max.artificial.length,  : 
  Training on less than 30 valid 'active' CDS is not allowed!

when I used the test data PRJNA591214, I got :

> find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
+              df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+              startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders

--------------------------------------
Registered organism is: Homo sapiens
Project location:  /home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
Mutant     WT 
     2      2 
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
GRObjects already exist, skipping remake of them
UORFAtlas already exist, skipping remake of them
--------------------------------------
Making training data from CDS and trailers (3' UTRs):
[1] "Ribo-seq features exists in DB for: cds delete and run again if you want new"
[1] "Ribo-seq features exists in DB for: three delete and run again if you want new"
Training data on CDS and 3' UTRs finished
--------------------------------------
[1] "sequence features exist, delete and run again if you want remake!"
--------------------------------------
Finding Ribo-seq features for uORFs
/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_Mutant_rep2.ofst

/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_WT_rep1.ofst

/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/ofst/RFP_Mutant_rep1.ofst

/home/bio/R/x86_64-pc-linux-gnu-library/4.2/ORFik/extdata/Homo_sapiens_sample/RFP_WT_rep2.ofst

错误: BiocParallel errors
  4 remote errors, element index: 1, 2, 3, 4
  0 unevaluated and other errors
  first remote error:
Error in `[.data.table`(scores, , `:=`(disengagementScores, disengagementScore(grl, : 矢量分配的种类/长度(closure/632205)不对

is it due to BiocParallel ?

Roleren commented 9 months ago

Ok, so we note that they have a bug in MatrixGenerics, I will do what I can to enforce the higher version, will push a fix shortly.

Roleren commented 9 months ago

Ah, this is a bit of a scary thing, you still have not flushed all the variables (as by default outputs are stored to .GlobalEnv).

Simply do: rm(list=ls()) # Now rerun find_uORFome :)

Shiywa commented 9 months ago

after rm(list = ls()), the error change:

> find_uORFome("/home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/",organism = "Homo sapiens",
+              df.rfp = df.rfp, df.rna = df.rna, df.cage = NULL, biomart = NULL,
+              startCodons = "ATG|CTG|TTG|GTG", BPPARAM = BiocParallel::MulticoreParam())
Welcome, setting up uORFome folders

--------------------------------------
Registered organism is: Homo sapiens
Project location:  /home/bio/biosoft/riboseq_data/projects/Alexaki_uORFome/
Run mode: uORF
--------------------------------------
Running without CAGE
Cancel if this was wrong, and input correct ORFik experiment!
Tissues validated, will run for 2 groups:
Replicates per group:
groups
CO WT 
 3  3 
--------------------------------------
This is default backend:
Using 125 threads from CPU
Example on how to register default backend to 10 cores:
[1] "register(MulticoreParam(workers = 10), default=TRUE)\n"
Initialization finished
--------------------------------------
Starting to find uORF search spaces
Running pipeline without CAGE data
finished new 5' UTRs and uORF search regions (already exist)
Searching for candidate uORFs
Creating uORF ID's
uniqueIDs already exist, skipping remake of them
GRObjects already exist, skipping remake of them
UORFAtlas already exist, skipping remake of them
--------------------------------------
Making training data from CDS and trailers (3' UTRs):
[1] "Ribo-seq features exists in DB for: cds delete and run again if you want new"
[1] "Ribo-seq features exists in DB for: three delete and run again if you want new"
Training data on CDS and 3' UTRs finished
--------------------------------------
[1] "sequence features exist, delete and run again if you want remake!"
--------------------------------------
Finding Ribo-seq features for uORFs
/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_CO1_Aligned.sortedByCoord.out_pshifted.ofst

/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_WT2_Aligned.sortedByCoord.out_pshifted.ofst

/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_CO2_Aligned.sortedByCoord.out_pshifted.ofst

/home/bio/biosoft/riboseq_data/processed_data/Ribo-seq/Alexaki_Human/aligned/pshifted/Ribo-Seq_WT3_Aligned.sortedByCoord.out_pshifted.ofst

Stop worker failed with the error: 环境下赋值的参数不对
错误: BiocParallel errors
  2 remote errors, element index: 1, 2
  4 unevaluated and other errors
  first remote error:
Error in covRleFromGR(x, weight = weight, ignore.strand = ignore.strand): Seqlengths of x contains NA values!
此外: Warning message:
call dbDisconnect() when finished working with a connection 
Roleren commented 9 months ago

Im running the pipeline for alexai data now, to see if I can replicate this.

Roleren commented 9 months ago

I can verify the new bug, will push a fix tomorrow morning.

Roleren commented 9 months ago

Fixed and pushed, it was a new seqlevels bug for uorfs, related to new GenomicFeatures.

Let me know that it works :)

Shiywa commented 9 months ago

emmm, should I reinstall uORFomePipe ? or install a new version of GenomicFeatures? my GenomicFeatures is 1.50.4.

Roleren commented 9 months ago

devtools::install_github("Roleren/uORFomePipe") # then restart R

Shiywa commented 9 months ago

waooooo, all of jobs work successfually. thank you very much for your help. the issue is over.

Thank you very much for your help again, but I have two concerns regarding the use of this software now, and I hope to get your advice.

First, I may encounter the need to customize my own reference genome and gtf file for analysis. It seems that the countTable_regions process relies on the gtf.db file for quantification. I tried to use the GenomeFeatures package to build the txdb file today, but it failed. Can you provide some advice on this?

Secondly, as a beginner, I still cannot fully understand the meaning of all the results from this pipeline. I have looked at your BMC Bioinformatics article, but I’m still not clear. Do you have any recommended articles to help me understand all the results?

sincerely !

Roleren commented 8 months ago
  1. First to make proper txdb, check out ORFik:::makeTxdbFromGenome() Check out chapter 3 of ORFik import tutorial
  2. The main output files you want to check out are:
    • Plots for analysis results: differential_uORF_usage.png
    • candidate_and_predicted_uORFs_total.bed (This file you can open in IGV and load Ribo-seq data as bigwig to inspect predicted results)
# To load predicted uORFs as GRangesList in R:
output_dir <- "/media/roler/S/data/Bio_data/projects/Alexaki_uORFome2/" # The output dir for find_uORFome
uorfs_wt <- readRDS(file.path(output_dir, "results/candidate_uORFs/_WT.uorf.rds")) # The GRangesList
uorfs_wt_pred <- readRDS(file.path(output_dir, "prediction_model/prediction_WT.rds")) # The prediction state per uorf
uorfs_wt_pred_subset <- uorfs[uorfs_wt_pred$predict == 1] # Subset the GRL
# To load features to understand why they were predicted TRUE
uorfs_wt_features <- readRDS(file.path(output_dir, "features/PredictionData_WT.rds")) # The data.table of features
uorfs_wt_features_pred <- uorfs_wt_features[uorfs_wt_pred$predict == 1,] # Feature subset for predicted

That should be enough to start, if you have more questions I suggest you open a new issue with a specific new issue or question :)