Closed mkoromina closed 2 years ago
May I just say that I am not sure how the bug label was added in there, but please feel free to remove it and leave it as an issue to be solved! Thanks a lot!
Seems like there's at least several things going on here.
Most importantly, echolocatoR
doesn't support sumstats provided in parquet file format (yet). When you write "/path/to/my/parquet/file"
do you literally mean a parquet file? Because that won't work. As I mentioned here, you need to convert it to a format that data.table can recognize (e.g. tsv.gz). There's a number of ways to do this, but if you prefer to stay in R, you can use arrow
.
dat <- arrow::read_parquet(filepath)
data.table::fwrite(dat, newfilepath, sep="\t")
PS - I set the bug label to be automatic, so don't worry about it!
Hi @bschilder, thank you very much for this! I experienced though the same issue when using .tsv files created by MungeSumstats. I will try though your way with arrow package and come back to you if needed. Thanks a lot once again!
Ok, keep me posted! I should also mention I'm looking into some issues with munging and tabix indexing PGC files such as the BD2021 dataset. https://github.com/neurogenomics/MungeSumstats/issues/91
Hey, Just an update in here, so as to let you know that .parquet format won't work (even when reformatted via arrow::read_parquet
to a .tsv.gz file). The main issue is actually error messages occurring and which have to do with its appropriate tabix indexing in downstream analysis.
## 1. Bug description
Upon running the finemap_loci() function, the above mentioned error message occurs. (this is somehow a continuation from a thread in issue #74.)
### Console output
Expected behaviour
I'd expect the finemap_loci() function to run efficiently. To note also, that upon running the script below, the munged sumstats (.parquet) are further zipped (i.e., .parquet.bgz), which I am not sure if it is somehow related to the error message.
update: same error message produced when using .tsv files with MungeSumStats package.
## 2. Reproducible example
##Code
### Data Unfortunately, I cannot upload the data, but here is a short description: -My loci to be fine mapped are saved in a standard excel format with the following header: SNP | CHR | BP | P | OR | SE.
-My GWAS sumstats have been munged with the munge_polyfun_sumstats.py and they are saved in a .parquet format.
## 3. Session info