Closed yangshichen0713 closed 11 months ago
What does dir("/media/AnalysisTempDisk2/Yangshichen/0_HIV/ATAC-genome/fa")
return?
It should return one FASTA file per seqname, and the names of the FASTA files should be the seqnames with the .fa
suffix added to them e.g. chr1.fa
, chr2.fa
, etc...
yes
> dir("/media/AnalysisTempDisk2/Yangshichen/0_HIV/ATAC-genome/fa")
[1] "AF033819.3_cds_AAC82591.1_5.fa" "AF033819.3_cds_AAC82592.1_6.fa" "AF033819.3_cds_AAC82593.1_1.fa" "AF033819.3_cds_AAC82594.1_3.fa"
[5] "AF033819.3_cds_AAC82595.1_4.fa" "AF033819.3_cds_AAC82596.1_8.fa" "AF033819.3_cds_AAC82597.1_9.fa" "AF033819.3_cds_AAC82598.2_2.fa"
[9] "AF033819.3_cds_AAD20388.1_7.fa" "chr1.fa" "chr10.fa" "chr11.fa"
[13] "chr12.fa" "chr13.fa" "chr14.fa" "chr15.fa"
[17] "chr16.fa" "chr17.fa" "chr18.fa" "chr19.fa"
[21] "chr2.fa" "chr20.fa" "chr21.fa" "chr22.fa"
[25] "chr3.fa" "chr4.fa" "chr5.fa" "chr6.fa"
[29] "chr7.fa" "chr8.fa" "chr9.fa" "chrM.fa"
Thanks. I was able to reproduce.
This should be fixed in BSgenome 1.70.1 (BioC 3.18, current release) and 1.71.1 (BioC 3.19, current devel). The fix will become available to BioC 3.18 and 3.19 users via BiocManager::install()
in the next 48 hrs or so.
To install now, you can do remotes::install_github("Bioconductor/BSgenome@RELEASE_3_18")
if you're using BioC 3.18, or remotes::install_github("Bioconductor/BSgenome")
if you're using BioC 3.19. Note that we usually strongly advice against installing directly from GitHub so please consider this an exception to the rule.
Let me know if you still run into troubles after updating.
谢谢。我能够重现。
这应该在BSgenome 1.70.1(BioC 3.18,当前版本)和 1.71.1(BioC 3.19,当前开发)中修复。
BiocManager::install()
该修复程序将在接下来的 48 小时左右向 BioC 3.18 和 3.19 用户提供。
remotes::install_github("Bioconductor/BSgenome@RELEASE_3_18")
如果您使用的是 BioC 3.18 或remotes::install_github("Bioconductor/BSgenome")
BioC 3.19,则可以立即安装。请注意,我们通常强烈建议不要直接从 GitHub 安装,因此请将此视为该规则的例外。如果更新后您仍然遇到问题,请告诉我。
My Bsgenome has been upgraded to 1.71.1, but this error still occurs. :(
> remotes::install_github("Bioconductor/BSgenome")
Downloading GitHub repo Bioconductor/BSgenome@HEAD
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: GenomicRa... (1.54.0 -> 1.54.1) [CRAN]
Enter one or more numbers, or an empty line to skip updates: 1
GenomicRa... (1.54.0 -> 1.54.1) [CRAN]
Installing 1 packages: GenomicRanges
trying URL 'https://bioconductor.org/packages/3.18/bioc/bin/macosx/big-sur-arm64/contrib/4.3/GenomicRanges_1.54.0.tgz'
Content type 'application/x-gzip' length 2390706 bytes (2.3 MB)
==================================================
downloaded 2.3 MB
The downloaded binary packages are in
/var/folders/9n/kyl7_ss54n33sk160tgb2brw0000gn/T//Rtmpolh7d7/downloaded_packages
── R CMD build ──────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/private/var/folders/9n/kyl7_ss54n33sk160tgb2brw0000gn/T/Rtmpolh7d7/remotese15c289d3be2/Bioconductor-BSgenome-8810bf9/DESCRIPTION’ ...
─ preparing ‘BSgenome’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
Removed empty directory ‘BSgenome/inst/pkgtemplates/BSgenome_datapkg/inst/extdata’
Removed empty directory ‘BSgenome/inst/pkgtemplates/BSgenome_datapkg/inst’
Removed empty directory ‘BSgenome/inst/pkgtemplates/MaskedBSgenome_datapkg/inst/extdata’
Removed empty directory ‘BSgenome/inst/pkgtemplates/MaskedBSgenome_datapkg/inst’
─ building ‘BSgenome_1.71.1.tar.gz’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘BSgenome/inst/extdata/GentlemanLab/BSgenome.Gmellonella.NCBI.ASM364042v2-tools/fasta_to_sorted_2bit.R’
* installing *source* package ‘BSgenome’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (BSgenome)
> setwd('/Users/mac/Desktop/ATAC-genome/')
> library("BSgenome")
> library("Biostrings")
> library("BSgenomeForge")
> seed <- "HIV-1_hg38-seed.txt"
> forgeBSgenomeDataPkg(seed, verbose=TRUE)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'seqlevels': "HIV-1_hg38" is not a registered NCBI assembly or UCSC genome (use registered_NCBI_assemblies() or
registered_UCSC_genomes() to list the NCBI or UCSC assemblies/genomes currently registered in the
GenomeInfoDb package)
Hmm.. now the error seems to be happening earlier because forgeBSgenomeDataPkg()
does not display the "Creating package in ./BSgenome.Hhiv1.NCBI.test" line anymore.
Did you modify your seed file? In particular, note that if the seqnames
or circ_seqs
fields are missing then you will still get the error about the genome not being registered, so make sure these 2 fields are present.
Also I just realized that installing with remotes::install_github()
does not work properly with the BSgenome package. Can you please install again using the following 2-step method?
git clone https://github.com/Bioconductor/BSgenome --branch RELEASE_3_18
R CMD INSTALL BSgenome
This will install BSgenome 1.70.1.
Here is what I get when using your seed file (only slightly modified) with BSgenome 1.70.1:
library(BSgenome)
forgeBSgenomeDataPkg("my_seed.dcf", verbose=TRUE)
# Creating package in ./BSgenome.Hhiv1.NCBI.test
# Saving 'seqlengths' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/seqlengths.rda' ... DONE
# Loading FASTA file '/home/hpages/sandbox/tmp/chr1.fa' in 'chr1' object ... DONE
# Saving 'chr1' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/chr1.rda' ... DONE
# Loading FASTA file '/home/hpages/sandbox/tmp/chrM.fa' in 'chrM' object ... DONE
# Saving 'chrM' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/chrM.rda' ... DONE
# Warning messages:
# 1: In FUN(X[[i]], ...) :
# In file '/home/hpages/sandbox/tmp/chr1.fa': sequence description "" doesn't match user-specified sequence name "chr1"
# 2: In FUN(X[[i]], ...) :
# In file '/home/hpages/sandbox/tmp/chrM.fa': sequence description "" doesn't match user-specified sequence name "chrM"
# 3: In getSeqlengths(seqnames, prefix = prefix, suffix = suffix, seqs_srcdir = seqs_srcdir, :
# genome is unknown ('Seqinfo(genome="HIV-1_hg38")' failed) ==> unable to
# check the lengths of the sequences in the files
3 warnings but no error.
My seed file:
Package: BSgenome.Hhiv1.NCBI.test
Title: Full genome sequences for HIV-1_hg38 by ysc.
Description: Full genome sequences for HIV-1 and hg38 by ysc.
Version: 1.0.0
organism: Homo sapiens
common_name: Human
genome: HIV-1_hg38
provider: NCBI
release_date: 2023/10/30
organism_biocview: Homo_sapiens
BSgenomeObjname: Hhiv1
seqnames: c("chr1", "chrM")
circ_seqs: "chrM"
seqs_srcdir: /home/hpages/sandbox/tmp
ondisk_seq_format: rda
I have chr1.fa
and chrM.fa
in /home/hpages/sandbox/tmp/
:
dir("/home/hpages/sandbox/tmp")
# [1] "chr1.fa" "chrM.fa"
sessionInfo():
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 23.10
Matrix products: default
BLAS: /home/hpages/R/R-4.3.0/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.3.0/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BSgenome_1.70.1 rtracklayer_1.62.0 BiocIO_1.12.0
[4] Biostrings_2.70.1 XVector_0.42.0 GenomicRanges_1.54.1
[7] GenomeInfoDb_1.38.0 IRanges_2.36.0 S4Vectors_0.40.1
[10] BiocGenerics_0.48.0
loaded via a namespace (and not attached):
[1] crayon_1.5.2 DelayedArray_0.28.0
[3] SummarizedExperiment_1.32.0 GenomicAlignments_1.38.0
[5] rjson_0.2.21 RCurl_1.98-1.12
[7] XML_3.99-0.14 MatrixGenerics_1.14.0
[9] Biobase_2.62.0 grid_4.3.0
[11] restfulr_0.0.15 abind_1.4-5
[13] bitops_1.0-7 yaml_2.3.7
[15] compiler_4.3.0 codetools_0.2-19
[17] BiocParallel_1.36.0 lattice_0.22-5
[19] SparseArray_1.2.0 parallel_4.3.0
[21] GenomeInfoDbData_1.2.11 Matrix_1.6-1.1
[23] tools_4.3.0 matrixStats_1.0.0
[25] Rsamtools_2.18.0 zlibbioc_1.48.0
[27] S4Arrays_1.2.0
Hmm.. now the error seems to be happening earlier because
forgeBSgenomeDataPkg()
does not display the "Creating package in ./BSgenome.Hhiv1.NCBI.test" line anymore.Did you modify your seed file? In particular, note that if the
seqnames
orcirc_seqs
fields are missing then you will still get the error about the genome not being registered, so make sure these 2 fields are present.Also I just realized that installing with
remotes::install_github()
does not work properly with the BSgenome package. Can you please install again using the following 2-step method?
git clone https://github.com/Bioconductor/BSgenome --branch RELEASE_3_18
R CMD INSTALL BSgenome
This will install BSgenome 1.70.1.
Here is what I get when using your seed file (only slightly modified) with BSgenome 1.70.1:
library(BSgenome) forgeBSgenomeDataPkg("my_seed.dcf", verbose=TRUE) # Creating package in ./BSgenome.Hhiv1.NCBI.test # Saving 'seqlengths' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/seqlengths.rda' ... DONE # Loading FASTA file '/home/hpages/sandbox/tmp/chr1.fa' in 'chr1' object ... DONE # Saving 'chr1' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/chr1.rda' ... DONE # Loading FASTA file '/home/hpages/sandbox/tmp/chrM.fa' in 'chrM' object ... DONE # Saving 'chrM' object to compressed data file './BSgenome.Hhiv1.NCBI.test/inst/extdata/chrM.rda' ... DONE # Warning messages: # 1: In FUN(X[[i]], ...) : # In file '/home/hpages/sandbox/tmp/chr1.fa': sequence description "" doesn't match user-specified sequence name "chr1" # 2: In FUN(X[[i]], ...) : # In file '/home/hpages/sandbox/tmp/chrM.fa': sequence description "" doesn't match user-specified sequence name "chrM" # 3: In getSeqlengths(seqnames, prefix = prefix, suffix = suffix, seqs_srcdir = seqs_srcdir, : # genome is unknown ('Seqinfo(genome="HIV-1_hg38")' failed) ==> unable to # check the lengths of the sequences in the files
3 warnings but no error.
My seed file:
Package: BSgenome.Hhiv1.NCBI.test Title: Full genome sequences for HIV-1_hg38 by ysc. Description: Full genome sequences for HIV-1 and hg38 by ysc. Version: 1.0.0 organism: Homo sapiens common_name: Human genome: HIV-1_hg38 provider: NCBI release_date: 2023/10/30 organism_biocview: Homo_sapiens BSgenomeObjname: Hhiv1 seqnames: c("chr1", "chrM") circ_seqs: "chrM" seqs_srcdir: /home/hpages/sandbox/tmp ondisk_seq_format: rda
I have
chr1.fa
andchrM.fa
in/home/hpages/sandbox/tmp/
:dir("/home/hpages/sandbox/tmp") # [1] "chr1.fa" "chrM.fa"
sessionInfo():
R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 23.10 Matrix products: default BLAS: /home/hpages/R/R-4.3.0/lib/libRblas.so LAPACK: /home/hpages/R/R-4.3.0/lib/libRlapack.so; LAPACK version 3.11.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C time zone: America/Los_Angeles tzcode source: system (glibc) attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BSgenome_1.70.1 rtracklayer_1.62.0 BiocIO_1.12.0 [4] Biostrings_2.70.1 XVector_0.42.0 GenomicRanges_1.54.1 [7] GenomeInfoDb_1.38.0 IRanges_2.36.0 S4Vectors_0.40.1 [10] BiocGenerics_0.48.0 loaded via a namespace (and not attached): [1] crayon_1.5.2 DelayedArray_0.28.0 [3] SummarizedExperiment_1.32.0 GenomicAlignments_1.38.0 [5] rjson_0.2.21 RCurl_1.98-1.12 [7] XML_3.99-0.14 MatrixGenerics_1.14.0 [9] Biobase_2.62.0 grid_4.3.0 [11] restfulr_0.0.15 abind_1.4-5 [13] bitops_1.0-7 yaml_2.3.7 [15] compiler_4.3.0 codetools_0.2-19 [17] BiocParallel_1.36.0 lattice_0.22-5 [19] SparseArray_1.2.0 parallel_4.3.0 [21] GenomeInfoDbData_1.2.11 Matrix_1.6-1.1 [23] tools_4.3.0 matrixStats_1.0.0 [25] Rsamtools_2.18.0 zlibbioc_1.48.0 [27] S4Arrays_1.2.0
Thank you very much! I modified my seed file and Bsgenome version, but new problems appeared.
forgeBSgenomeDataPkg(seed,destdir="/Users/mac/Desktop/ATAC-genome/",verbose=TRUE)
Creating package in /Users/mac/Desktop/ATAC-genome//BSgenome.Hhiv1.NCBI.test
Saving 'seqlengths' object to compressed data file '/Users/mac/Desktop/ATAC-genome//BSgenome.Hhiv1.NCBI.test/inst/extdata/seqlengths.rda' ... Error in gzfile(file, "wb") : cannot open the connection
In addition: There were 37 warnings (use warnings() to see them)
That's because remotes::install_github()
does not work properly with the BSgenome package, as explained earlier. Did you reinstall as I said? Alternatively you can wait another 26 hrs or so to get the new version via BiocManager::install()
.
Yes, I have updated (Bsgenomeremotes::install_github("Bioconductor/BSgenome@RELEASE_3_18")) and modified my seed file~ The error is still:" Error in gzfile(file, "wb") : cannot open the connection". Thanks
> forgeBSgenomeDataPkg("HIV_1_hg38_seed.txt",seqs_srcdir="/Users/mac/Desktop/ATAC_genome/fa",
+ destdir="/Users/mac/Desktop/ATAC_genome",verbose=TRUE,replace=TRUE)
Creating package in /Users/mac/Desktop/ATAC_genome/BSgenome.Hhiv1.NCBI.ysc
Saving 'seqlengths' object to compressed data file '/Users/mac/Desktop/ATAC_genome/BSgenome.Hhiv1.NCBI.ysc/inst/extdata/seqlengths.rda' ... Error in gzfile(file, "wb") : cannot open the connection
In addition: There were 37 warnings (use warnings() to see them)
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome_1.70.1 rtracklayer_1.62.0 BiocIO_1.12.0 Biostrings_2.70.1 XVector_0.42.0
[6] GenomicRanges_1.54.0 GenomeInfoDb_1.38.0 IRanges_2.36.0 S4Vectors_0.40.1 BiocGenerics_0.48.0
loaded via a namespace (and not attached):
[1] Matrix_1.6-1.1 compiler_4.3.1 BiocManager_1.30.22 rjson_0.2.21
[5] crayon_1.5.2 SummarizedExperiment_1.32.0 Biobase_2.62.0 Rsamtools_2.18.0
[9] bitops_1.0-7 GenomicAlignments_1.38.0 parallel_4.3.1 callr_3.7.3
[13] BiocParallel_1.36.0 yaml_2.3.7 lattice_0.22-5 R6_2.5.1
[17] S4Arrays_1.2.0 curl_5.1.0 XML_3.99-0.14 DelayedArray_0.28.0
[21] desc_1.4.2 MatrixGenerics_1.14.0 rprojroot_2.0.3 GenomeInfoDbData_1.2.11
[25] cli_3.6.1 SparseArray_1.2.0 ps_1.7.5 zlibbioc_1.48.0
[29] processx_3.8.2 grid_4.3.1 rstudioapi_0.15.0 remotes_2.4.2.1
[33] prettyunits_1.2.0 codetools_0.2-19 pkgbuild_1.4.2 abind_1.4-5
[37] RCurl_1.98-1.12 restfulr_0.0.15 matrixStats_1.0.0 tools_4.3.1
seed file:
Package: BSgenome.Hhiv1.NCBI.ysc
Title: Full genome sequences for HIV_1_hg38 by ysc.
Description: Full genome sequences for HIV_1 and hg38 by ysc.
Version: 1.0.0
organism: Homo sapiens
common_name: Human
genome: HIV_1_hg38
provider: NCBI
release_date: 2023/10/30
organism_biocview: Homo_sapiens
BSgenomeObjname: Hhiv1
seqnames: c(as.character((read.table("/Users/mac/Desktop/ATAC_genome/HIV_1_hg38.chromName.txt",header=F)$V1)[-c(26:194)]))
circ_seqs: "chrM"
seqs_srcdir: /Users/mac/Desktop/ATAC_genome/fa
ondisk_seq_format: rda
Yes, I have updated (Bsgenomeremotes::install_github("Bioconductor/BSgenome@RELEASE_3_18"))
As I said twice earlier already, don't do that. Please re-read carefully my previous comments.
Thank you very much, it has been successful, thank you!
Glad it worked. Cheers!
Hello, I now need to generate a Bsgenome that is a fusion genome of a person and the HIV-1 virus. I already have the fa files of hg38 and HIV-1 from NCBI. I keep getting errors when running the seed file. I don't know what to use for the genome. In addition, can you help me find out what's wrong with this seed file? Thanks in advance !
The error message is as follows: