cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
41 stars 43 forks source link

pipeline/extract_effects.ipynb file not found #1024

Open neeteshpandey opened 3 months ago

neeteshpandey commented 3 months ago

Hi, I am trying to run workflow "A multivariate EBNM approach for mixture multivariate distribution estimate" with Minimal working example approach, but at initial step in the following command I am getting an error "ERROR: Failed to locate pipeline/extract_effects.ipynb.sos".

sos run pipeline/extract_effects.ipynb extract_effects \ --name protocol_example.mashr_input \ --analysis-units <(cat protocol_example.mashr.list | cut -f 2 ) \ --need-genename TRUE \ --sum-stat protocol_example.mashr.list

On further checking that particular file "extract_effects.ipynb" is not present in the pipeline folder.

Can you please suggest from where I can get this notebook?

Thanks!

gaow commented 3 months ago

For your purpose which is multi-trait GWAS different molecular QTL analysis, you don't need to run that pipeline designed for QTL. But the format of data should be the same at the end. You can see the format of data from this example: https://stephenslab.github.io/mashr/articles/intro_mash_dd.html the simdata data-set. It should contain "strong", "random" and "null" summary statistics:

  1. "strong" can be obtained when you do LD clumping, ideally jointly on all traits, but also okay if you do it per trait and put together a list of index variants thus clumped, then create such a table with rows being the union of the index variants and columns being the traits themselves
  2. "null" can be a random set of variants, after LD pruning and have absolute value of z-score < 2
  3. "random" can be a random set of variants after LD pruning.

For the null and random set, I believe with about 10K variants in each category is sufficient. For "strong" we should hope to get as many as possible depending on signals in the data.

when these are extracted please format to the same as shown in the vignette on mashr pacakge website (the beginning of my response) and continue from there.

On Fri, Aug 2, 2024 at 12:11 AM neeteshpandey @.***> wrote:

Hi, I am trying to run workflow "A multivariate EBNM approach for mixture multivariate distribution estimate" with Minimal working example approach, but at initial step in the following command I am getting an error "ERROR: Failed to locate pipeline/extract_effects.ipynb.sos".

sos run pipeline/extract_effects.ipynb extract_effects --name protocol_example.mashr_input --analysis-units <(cat protocol_example.mashr.list | cut -f 2 ) --need-genename TRUE --sum-stat protocol_example.mashr.list

On further checking that particular file "extract_effects.ipynb" is not present in the pipeline folder.

Can you please suggest from where I can get this notebook?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/cumc/xqtl-protocol/issues/1024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHADYPUJRS4P6OMYVP7OWDZPMBHBAVCNFSM6AAAAABL3Y5IEKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ2DGOJXHEZDINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>