RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
34 stars 11 forks source link

Requesting Clarification on User-Supplied LD Matrices #138

Open dym22 opened 6 months ago

dym22 commented 6 months ago

Is your feature request related to a problem? Please describe. The sample data I am working with is relatively homogenous but not well characterized by any of the populations in 1000G. Additionally, as I have the genotype data anyway, I would like to provide my own LD matrices rather than relying on those from a reference panel. I have tried doing this a few ways--supplying a path to text file which contains full path directories to pre-computed (with plink) LD matrices (both with .txt.gz copies and without, both adding SNP IDs to column/row and without), supplying a list of file .vcf.gz files, etc. The documentation does not provide much guidance on exactly what the argument "LD_reference" expects if it is not 1000G or UKBB. Some clarification on this would be appreciated.

Describe the solution you'd like Ideal would be a worked example, but if that is not feasible, just more clarification would help.

Describe alternatives you've considered Ideal would be an additional feature that allows simply providing the directory (and maybe filtering parameters) to a text file that contains the paths of the bfiles (something like GCTA's mbfile flag) that took care of the everything on the back end, but understandably that would involve quite a lot of work. If I could just get further clarification on how the matrices need to be formatted and how to point the LD_reference argument to them, that would be great.

Additional context Great package! Preliminary results (just using reference panels for LD matrices) look good and whole procedure is relatively hassle-free compared to most fine-mapping software out there.

Edit: wording

bschilder commented 4 months ago

So I realize now that I haven't documented this feature well, but certain finemap_loci args can take a list equal to the number of loci being fine-mapped.

In the case of the LD_reference arg, this then passes to echoLD::get_LD which infers how to import the respective LD file (or computes LD from a VCF file).

So this might look something like:

loci <- c("A","B","C")
LD_reference <- list("./filepath.A.csv.gz","./filepath.B.csv.gz", "./filepath.C.csv.gz")
echolocatoR::finemap_loci(
...,
loci=loci,
LD_reference=LD_reference
)

Here's the different file types that LD_reference can accept: Screenshot 2024-05-10 at 21 33 49

Let me know if this helps.