cnobles / iGUIDE

Bioinformatic pipeline for identifying dsDNA breaks by marker based incorporation, such as breaks induced by designer nucleases like Cas9.
https://iguide.readthedocs.io/en/latest/
GNU General Public License v3.0
20 stars 9 forks source link

running iguide with mouse datasets: Error in if (grepl(".rds$", ref$file)) { : argument is of length zero #80

Open weihongqi opened 3 years ago

weihongqi commented 3 years ago

Dear Team,

iguide has been used successfully in my lab for human datasets. Thanks. It is a nice and useful tool.

Now we are testing iguide in analyzing mouse datasets. The mm10 reference genome was installed from Bioconductor BSgenome. The analysis ran till the evaluation step, and failed with: Error in if (grepl(".rds$", ref$file)) { : argument is of length zero

In the config file the refGene and the other two gene lists were commented out, since I don't know where to get the rds and tsv files for mm10. But the error message seemed to suggest that the refGene.rds is required for the analysis. Is this correct? If it is, could you please let me know if you have the mouse refGene.rds for downloading? Or which input file should be used to generate such genomicRange object?

image

If it is not due to the refGene.rds, do you have further suggestions on debugging? Your input will be highly appreciated.

Kind regards,

Weihong

cnobles commented 3 years ago

Hi Weihong,

You'll want to download the related data sets from UCSC or NCBI. The default formats for these files are from UCSC genome browser tables. But if you take a look at them, they contain range information (which chromosome, start, and stop positions), as well as a symbol column that you can specify. So all you really need to find is a reference gene annotation set that you want to use for mouse, based on the same reference genome, and then format the tables here. There are several formats that are acceptable to include (rds, csv, or tsv). There is a section on this in the documentation, pasted below:

refGenes / oncoGeneList / specialGeneList These are special reference files in either text or BioConductoR’s GenomicRanges objects. They can be in an ‘.rds’ format or table format (‘.csv’ or ‘.tsv’). The file parameter should indicate the file path to the file (relative paths should be relative to the SnakeFile), and the symbolCol parameter should indicate the column in the data object which contains the reference names to be used in the analysis.

The oncoGeneList and the specialGeneList are simply lists that should contain subsets of the refGene list you choose. Double check the names in these lists are present in your refGene list, otherwise you won't get accurate annotations.

Best, Chris

weihongqi commented 3 years ago

Dear Chris,

Thanks.

I downloaded the mm10 knowngenes from UCSC genome browser tables as a tab delimited text file, converted it to a GRange object, saved it as a rds file. Then I subet the downloaded text file with the mouse gene names corresponding to those in the human oncoGeneList and the specialGeneList.

The evaluation R script could read in these files properly:

image

"Importing experimental data and configurations" was successful and the analysis continued. But the fisher exact tests failed with the following error

Error in fisher.test(mat) : all entries of 'x' must be nonnegative and finite Calls: p.adjust -> sapply -> sapply -> lapply -> FUN -> fisher.test Execution halted

my geneList looks like the following:

image

Do you know how to debug this? Is it possible to turn off gene set enrichment analysis?

Many thanks in advance,

Weihong