Closed PhoebeGuo97 closed 2 years ago
Thanks for the information @PhoebeGuo97
The error stems from the fact that your sumstats are not sorted by genomic coordinates, which is a requirement for tabix
[ti_index_core] the file out of order at line 1253647
Use the argument tabix_index=TRUE
in MungeSumstats::format_sumstats
when processing your GWAS. This will automatically ensure that the file is sorted first before indexing.
Alternatively, you can sort the file yourself and resave it. Using the sort_coordinates()
function in the echotabix
is one way to do this (which is under development but should still work).
https://github.com/RajLabMSSM/echotabix
Hello, I tried the option 1, i.e. fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37",tabix_index=TRUE) but it did not solve the issue.
Here is what I saw in Console after re-running MungeSumstats: Tabix-indexing file. [ti_index_core] the file out of order at line 1253647 Create tabix index failed for [ /var/folders/g9/f5z6vsbx7q3_wmk06jz0j9zm0000gn/T//RtmpUeD8qB/filec8a3e5725dc.tsv.bgz ]! Summary statistics report:
[.data.table
(sumstats_dt, , :=
((names(empty_cols)), NULL)) :
Column 'NA' does not exist to removeTabix sorting/indexing is now generally more stable in the echoverse
branch. Could you try using this updated version? (it will be pushed to master soon as well):
remotes::install_github("RajLabMSSM/echolocatoR@echoverse")
remotes::install_github("RajLabMSSM/echolocatoR")
I wonder if I need to change any scripts to use the updated version? Thanks!
Hi @PhoebeGuo97, yes quite a bit has changed. I've just uploaded some details about these changes here: https://github.com/RajLabMSSM/echolocatoR#echolocator-v10-vsv20
1. Bug description
I have munged my summary statistics. I ran fine map_loci()and got errors.
Console output
Expected behaviour
(A clear and concise description of what you expected to happen.)
2. Reproducible example
Code
This is how I munged my summary statistics: fullSS_path <- "/Users/fguo/Desktop/GWAS/Jansen2019/AD_sumstats_Jansenetal_2019sept.txt.gz" fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37")
Data
This is the output of the step of MungeSumstats Successfully finished preparing sumstats file, preview: Reading header. SNP CHR BP A1 A2 UNIQID.A1A2 Z P NSUM N DIRECTION FRQ BETA SE 1: rs12184267 1 715265 C T 1:715265_T_C -2.121973 0.03384 359856 359856 ??+? 0.9591931 -0.01264265 0.005957967 2: rs12184277 1 715367 A G 1:715367_G_A -1.957915 0.05024 360230 360230 ??+? 0.9589313 -0.01162351 0.005936678 3: rs12184279 1 717485 C A 1:717485_A_C -1.912438 0.05582 360257 360257 ??+? 0.9594241 -0.01141891 0.005970864 4: rs116801199 1 720381 G T 1:720381_T_G -2.295404 0.02171 360980 360980 ??+? 0.9578380 -0.01344289 0.005856439 Returning path to saved data.
3. Session info
(Add output of the R function
utils::sessionInfo()
below. This helps us assess version/OS conflicts which could be causing bugs.)