Closed AMCalejandro closed 2 years ago
Hi, I also receive the same error message even when trying to run the vignette. Did you manage to find a workaround this issue?
Hi @AMCalejandro and @mkoromina, thanks for bringing this to my attention. I haven't had much time to work on this project in a while but I will try to get back to it soon (possibly this weekend). In the meantime, PRs are more than welcome!
I should also note that I'm working on a long-term project to modularize echolocatoR into different subpackages (with proper unit tests) to help minimize errors.
Thank you for your patience.
Hi @bschilder, may I also note to this end, that apart from the above mentioned error message ([Error in [.data.table(x, r, vars, with = FALSE) : column(s) not found: SNP]), I also get this one _"Error in cDict[[chrom_col]] : subscript out of bounds"._ However, stats have been munged beforehand (.parquet format). Any advice as why this error message pops up? Thank you very much in advance.
Ok, so i think I've fixed this issue with reading in .cred files, as well as with reading in .snp files (when .cred is not available). See here: https://github.com/RajLabMSSM/echolocatoR/issues/72#issuecomment-1059423642
@mkoromina are you trying to feed in a parquet file to echolocatoR? It currently only supports whatever formats are supported by data.table::fread
, so .tsv.gz for example.
Hi @bschilder, I am loading .gz files which are actually munged sumstats produced by ldsc. Do you suggest doing any amendments to it? Thanks a lot!
@mkoromina I don't recommend using LDSC's python script for munging sumstats since it makes a lot of assumptions of column identities (e.g. A1/A2), doesn't have as many colname mappings, doesn't perform any QC or genome build validation, and doesn't map SNPs RSIDs to a standard nomenclature (amongst other limitations).
Please use MungeSumstats
which is much more robust. This is what the munged=TRUE
flag is referring to specifically in echolocatoR::finemap_loci
.
Here's the docs onfinemap_loci
:
https://rajlabmssm.github.io/echolocatoR/reference/finemap_loci.html
@bschilder , thanks so much for this. May I ask you if munged sumstats via polyfun's respective python script will work on echolocatoR? If yes, is there a way of converting them to .tsv.gz files? Will try your MungeSumstats recommendation too as well. Thanks a lot!
sure, it can still potentially work. you just use pandas to read the parquets into python and then write them as tab-delimited files.
import pandas as pd
dat = pd.read_parquet("<file_path>")
dat.to_csv("<new_path>.tsv.gz", sep="\t")
You could also try out the new read/write_parquet functions I've added to echodata
, though this does depend on a functioning echoR conda environment. So might be simpler to just use python directly.
This issue is a continuation somehow #72 I am showing an example in which we have snps with a prob_col value higher than the threshold but echolocatoR fails in assigning CS and PP to SNPs
We see on the error message how "prob" column cannot fe found
Even though the cred5 file is present, and the tool should have used FINEMAP.import_data.cred(), I am going through
When I run this manually, we see how the prob column is present, and how the code works