Open hpages opened 5 months ago
Hello, I am having problems with the getChromInfoFromNCBI function. When I try chrominfo <- getChromInfoFromNCBI("T2T-CHM13v2.0"), shows this: Error in download.file(url, destfile, method, quiet = TRUE): it was not possible to open the following URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/' Warning: In download.file(url, destfile, method, quiet = TRUE) : URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/': Timeout of 60 seconds was reached
Is this some trouble with my connection or with something else?
Thanks in advance
What's your sessionInfo()
?
Is the following code working for you?
library(GenomeInfoDb)
list_ftp_dir("https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/")
Please show the output you get.
If it doesn't work then yes it's some trouble with your connection. If you are behind an HTTP proxy (a common set up at many institutions), then it could also be a problem with the configuration of the proxy, in which case you would need to talk with your institution IT.
FWIW I just tried this again with BioC 3.19 (the latest Bioconductor release, requires R 4.4), and it works fine for me:
library(GenomeInfoDb)
list_ftp_dir("https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/")
# [1] "GCA_009914755.1_T2T-CHM13v0.7" "GCA_009914755.2_T2T-CHM13v1.0"
# [3] "GCA_009914755.3_T2T-CHM13v1.1" "GCA_009914755.4_T2T-CHM13v2.0"
My sessionInfo():
R version 4.4.0 alpha (2024-04-03 r86327)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 23.10
Matrix products: default
BLAS: /home/hpages/R/R-4.4.r86327/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomeInfoDb_1.40.1 IRanges_2.38.1 S4Vectors_0.42.1
[4] BiocGenerics_0.50.0
loaded via a namespace (and not attached):
[1] httr_1.4.7 compiler_4.4.0 R6_2.5.1
[4] tools_4.4.0 GenomeInfoDbData_1.2.12 UCSC.utils_1.0.0
[7] jsonlite_1.8.8
[Moved from https://github.com/Bioconductor/GenomicFeatures/issues/65 on March 22, 2024]
Question: How to make a TxDb object for the T2T-CHM13v2.0 genome (telomere to telomere Human genome), a.k.a. the hs1 genome at UCSC.
Answer: Unfortunately,
makeTxDbFromUCSC()
doesn't support hs1 at the moment, so we're going to use the GFF file provided by NCBI.Download
GCF_009914755.1_T2T-CHM13v2.0_genomic.gff.gz
from https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/Import the GFF file as a GRanges object:
Note that the sequence names in the GRanges object are RefSeq accessions:
Let's change them to the official chromosome names:
Add the complete sequence info to the GRanges object:
Use
makeTxDbFromGRanges()
to make a TxDb object from the GRanges object:Note that if you need the UCSC chromosome names instead of the NCBI ones, you can switch them with
seqlevelsStyle()
:H.