Bioconductor / BSgenome

Software infrastructure for efficient representation of full genomes and their SNPs
https://bioconductor.org/packages/BSgenome
9 stars 8 forks source link

Issue with forgeBSgenomeDataPkg() #69

Closed criveralopez closed 1 year ago

criveralopez commented 1 year ago

When I try to forge a BSgenome object, I get an error saying that the 2bit file cannot be found, even though it's in the same directory as the seed file.

> library(BSgenome)
> forgeBSgenomeDataPkg("/Documents/BSgenome_directory/BSGenome_Hmia_seed.txt")
Creating package in ./BSgenome.CRL.Hmiamia.Harvard.Hm2 
Error in .copyTwobitFile(seqfile_name, seqs_srcdir, seqs_destdir, verbose = verbose) : 
  File not found: /Documents/BSgenome_directory/hmia.2bit

Any thoughts on how to troubleshoot this?

hpages commented 1 year ago

even though it's in the same directory as the seed file

It doesn't matter. You still have to specify the location of the 2bit file in your seed file. What does your seed file look like? (please copy-paste it here)

criveralopez commented 1 year ago

Here is how the seed looks like:

Package: BSgenome.CRL.Hmiamia.Hm2 Title: Genome sequence for the three banded panther worm Description: Full genome sequence for Hofstenia miamia. Version: 2.0 organism: Hofstenia miamia common_name: Three banded panther worm provider: Harvard provider_version: Hm2 release_date: Unreleased release_name: Unreleased source_url: organism_biocview: Hofstenia miamia BSgenomeObjname: Hmiamia circ_seqs: SrcDataFiles: PkgExamples: seqs_srcdir: /Documents/BSgenome_directory seqfile_name: hmia.2bit

hpages commented 1 year ago

All sequence data files must be located in the seqs_srcdir folder, as explained in the BSgenomeForge.Rnw vignette. In other words, you either need to put the hmia.2bit file in the /Documents/BSgenome_directory/ folder, or you need to set seqs_srcdir to the path of the folder where the hmia.2bit file is actually located.

criveralopez commented 1 year ago

The hmia.2bit file is in the /Documents/BSgenome_directory/ folder, though. That's why I am confused as to why it isn't working...

hpages commented 1 year ago

You're getting this error:

File not found: /Documents/BSgenome_directory/hmia.2bit

which means that either the file is not there, or is not readable, or that somehow you're not specifying the path correctly. Note that it is very unusual, and probably not a good setup, to have a Documents folder in the root of the file system.

What does the following return?

file.exists("/Documents/BSgenome_directory/hmia.2bit")

Also what system are you on? (please share the output of your sessionInfo())

criveralopez commented 1 year ago

Thanks for taking the time to look into this!

> file.exists("/Documents/BSgenome_directory/hmia.2bit")
[1] TRUE
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 13.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocManager_1.30.20  BSgenome_1.62.0      rtracklayer_1.54.0  
 [4] Biostrings_2.62.0    XVector_0.34.0       GenomicRanges_1.46.1
 [7] GenomeInfoDb_1.30.1  IRanges_2.28.0       S4Vectors_0.32.4    
[10] BiocGenerics_0.40.0  Matrix_1.5-1        

loaded via a namespace (and not attached):
 [1] rstudioapi_0.14             GenomicAlignments_1.30.0   
 [3] zlibbioc_1.40.0             BiocParallel_1.28.3        
 [5] lattice_0.20-45             rjson_0.2.21               
 [7] rlang_1.1.0                 tools_4.1.1                
 [9] SummarizedExperiment_1.24.0 grid_4.1.1                 
[11] Biobase_2.54.0              cli_3.6.0                  
[13] matrixStats_0.63.0          yaml_2.3.7                 
[15] crayon_1.5.2                BiocIO_1.4.0               
[17] GenomeInfoDbData_1.2.7      restfulr_0.0.15            
[19] bitops_1.0-7                RCurl_1.98-1.10            
[21] DelayedArray_0.20.0         compiler_4.1.1             
[23] MatrixGenerics_1.6.0        Rsamtools_2.10.0           
[25] XML_3.99-0.14 
hpages commented 1 year ago

The code that raises the error is also calling file.exists(): https://github.com/Bioconductor/BSgenome/blob/7417e8410524432a51eda7e980b2b69d91dc73f6/R/BSgenomeForge.R#L338-L340 but in this case file.exists() returns FALSE. Go figure! :shrug:

Note that you're using BioC 3.14 which is old and no longer supported. I suggest that you update your installation to the current release, BioC 3.16, which requires R 4.2.

hpages commented 1 year ago

@criveralopez Where you able to solve your problem? Can we close this?