Open igrabski opened 2 months ago
this is what i see when i click on the vcf source link
can you snip out the first 500 lines of one of the problematic files as text and attach? Also provide sessionInfo() result after your error is triggered
Thanks for the fast response!
sessionInfo()
is below:
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantAnnotation_1.48.1 Rsamtools_2.18.0
[3] Biostrings_2.70.1 XVector_0.42.0
[5] SummarizedExperiment_1.32.0 Biobase_2.62.0
[7] GenomicRanges_1.54.1 GenomeInfoDb_1.38.1
[9] IRanges_2.36.0 S4Vectors_0.40.1
[11] MatrixGenerics_1.14.0 matrixStats_1.1.0
[13] BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] KEGGREST_1.42.0 rjson_0.2.21 lattice_0.22-5
[4] vctrs_0.6.4 tools_4.3.1 bitops_1.0-7
[7] generics_0.1.3 curl_5.1.0 parallel_4.3.1
[10] tibble_3.2.1 fansi_1.0.5 AnnotationDbi_1.64.1
[13] RSQLite_2.3.3 blob_1.2.4 pkgconfig_2.0.3
[16] Matrix_1.6-5 BSgenome_1.70.2 dbplyr_2.4.0
[19] lifecycle_1.0.4 GenomeInfoDbData_1.2.11 compiler_4.3.1
[22] stringr_1.5.0 progress_1.2.2 codetools_0.2-19
[25] yaml_2.3.7 RCurl_1.98-1.13 pillar_1.9.0
[28] crayon_1.5.2 BiocParallel_1.36.0 DelayedArray_0.28.0
[31] cachem_1.0.8 abind_1.4-5 tidyselect_1.2.0
[34] digest_0.6.33 stringi_1.8.1 restfulr_0.0.15
[37] dplyr_1.1.3 biomaRt_2.58.0 fastmap_1.1.1
[40] grid_4.3.1 cli_3.6.1 SparseArray_1.2.2
[43] magrittr_2.0.3 S4Arrays_1.2.0 GenomicFeatures_1.54.1
[46] XML_3.99-0.15 utf8_1.2.4 rappdirs_0.3.3
[49] filelock_1.0.2 prettyunits_1.2.0 bit64_4.0.5
[52] httr_1.4.7 bit_4.0.5 png_0.1-8
[55] hms_1.1.3 memoise_2.0.1 BiocIO_1.12.0
[58] BiocFileCache_2.10.1 rtracklayer_1.62.0 rlang_1.1.2
[61] glue_1.6.2 DBI_1.1.3 xml2_1.3.5
[64] R6_2.5.1 GenomicAlignments_1.38.0 zlibbioc_1.48.0
The VCF files are actually very short (maybe that's also weird...) so I just attached the whole file here. I read it into R using read.table
and then saved it using write.table
.
I think the sample file is missing a line
##fileformat=VCFv4.1
or similar, which is a required field
Thanks, that helps it run without errors but then I get an empty object?
> vcf <- readVcf('../../Downloads/dkfz_groebner_2018_exonic(1)/snvs_ICGC_MB64_somatic.vcf','hg19')
> vcf
class: CollapsedVCF
dim: 16 0
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 1 column: INFO
Fields with no header: INFO
geno(vcf):
List of length 0:
I guess the files are also missing the ##INFO
fields; I'm just parsing the specification and the example file system.file(package="VariantAnnotation", "extdata", "ex2.vcf")
. I think the bottom line is that the file is malformed.
Hi, when I use the
readVcf
command, I encounter an error that my input file is not recognized as a VCF. The specific command I am using and error is as follows:The VCF files I am trying to read are downloaded from here; the error occurs with any of the files beginning with
snvs_
that I have tested.I am wondering if these VCF files are older / in a non-standard format that
readVcf
does not recognize? Please let me know what the issue may be or how I can solve it. Thanks!