RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
34 stars 11 forks source link

Rgraphviz #21

Closed samkleeman1 closed 3 years ago

samkleeman1 commented 3 years ago

Hi, Many thanks for making this amazing package together. We are trying to adopt it in all our GWAS workflows. Rgraphviz cannot be installed on our cluster environment (Centos), I have tried everything I can think of to get it to install, but no joy. Can this package work without out? It appears to be a dependency of XGR, so currently I have modified the DESCRIPTION file to remove XGR so echolocator can be installed.

I am also getting loads of error messages with the vignette workflow e.g. :

Error in calculate.tstat(finemap_dat = finemap_dat) : 
  could not find function "calculate.tstat"
[1] "+ FINEMAP:: Importing conditional probabilties (.cred)..."
Error in t(rsids)[, 1] : subscript out of bounds

Kind regards,

Sam Kleeman PhD Student, Cold Spring Harbor Laboratory, NY

samkleeman1 commented 3 years ago

Another error message:

[1] "PolyFun:: Preparing SNP input file..."
[1] "+ PolyFun:: 180 SNPs identified."
[1] "+ PolyFun:: Writing SNP file ==> /grid/wsbs/home_norepl/skleeman/results/GWAS/cortiscore/CPS1/PolyFun/snps_to_finemap.txt.gz"
[1] "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR3/bin/python /grid/wsbs/home_norepl/skleeman/R/x86_64-pc-linux-gnu-library/4.0/echolocatoR/tools/polyfun//extract_snpvar.py --snps /grid/wsbs/home_norepl/skleeman/results/GWAS/cortiscore/CPS1/PolyFun/snps_to_finemap.txt.gz --out /grid/wsbs/home_norepl/skleeman/results/GWAS/cortiscore/CPS1/PolyFun/snps_with_priors.snpvar.tsv.gz"
[1] "++ Remove tmp file."
Error in data.table::fread(snp_w_priors.file, nThread = nThread) : 
  File '/grid/wsbs/home_norepl/skleeman/results/GWAS/cortiscore/CPS1/PolyFun/snps_with_priors.snpvar.tsv.gz' does not exist or is non-readable. getwd()=='/grid/wsbs/home_norepl/skleeman'
bschilder commented 3 years ago

Hi @samkleeman1 , please accept my sincerest apologies for the super delayed response. For some reason I've not been getting notifications sent to my email when new issues are added. I think I've fixed this now.

Thank you so much for bringing these to my attention, I'm working on them right now and will keep you posted as soon as I have some fixes.

bschilder commented 3 years ago

1. Installation

Apologies for this being so difficult to install. I've made a number of changes to improve this, including removing XGR (and thus Rgraphviz) and some other tricky packages from the dependencies. Instead, I added a new function called extra_installs() that helps users install some of them optional packages. Do note though, you'll need XGR if:

echolocatoR also used to depend on XGR's quite useful liftover function, but I've since extracted the code for this so as not to depend on XGR for just this feature.

2. calculate.tstat

Error in calculate.tstat(finemap_dat = finemap_dat) : 
  could not find function "calculate.tstat"

This was a silly error on my part, at some point I renamed this function to calculate_tstat() but missed renaming it in a couple spots. I've since made these changes in all parts of the code.

Items 2 and 3 have been pushed to the dev branch, and will merge with the main branch after I figure out # 3 as well as some other improvements I'm working on at the moment.

3. Importing conditional probabilities

[1] "+ FINEMAP:: Importing conditional probabilties (.cred)..."
Error in t(rsids)[, 1] : subscript out of bounds

A couple things would be helpful to figure out what's going on here:

  1. Can you check if this is because the .config file being produced by FINEMAP is empty?
  2. Also, which FINEMAP version is being used by your machine? Certain versions of FINEMAP can have issues running on certain machines, so I programmed echolocatoR to try the latest version and then switch to an earlier version if needed. These different versions can have different file output formats, and it's possible I may not have fully accounted for one of these formats.

I'll rerun on my end as well to see if I can replicate.

Thanks!, Brian

bschilder commented 3 years ago

Quick update; I've added some more safeguards to the FINEMAP subfunctions to ensure that the .cred file (conditional probabilities) is available, and if not, then try to find and import the .snp file instead (marginal probabilities). In the latter case, a note will be printed to the user letting them know.

bschilder commented 3 years ago

Hey @samkleeman1, just wanted to check in and see if these fixes worked alright for you? Let me know if there's anything I can do to help!

tillandlauer commented 3 years ago

I could install XGR successfully manually from the repository (R 4.04 and macOS 11.2.3): BiocManager::install("hfang-bristol/XGR", dependencies=T)

bschilder commented 3 years ago

I could install XGR successfully manually from the repository (R 4.04 and macOS 11.2.3): BiocManager::install("hfang-bristol/XGR", dependencies=T)

Thanks for the info @tillandlauer! Perhaps they've updated XGR's Bioconductor distribution since I last checked. I'll modify the extra_installs() accordingly.

bschilder commented 3 years ago

extra_installs() now uses a tiered approach, that tries several strategies until XGR is finally installed:

#### XGR ####
    ## XGR has been extra tricky to install and
    ## seems to be very sensitive to R version and the versions of its dependencies.
    ## So employ a tiered approach to installation: Bioconductor ==> CRAN ==> archived
    # several packages are no longer available on CRAN - get the last approved versions
    if(xgr){
        #### Try via Bioconductor ####
        message("+ Attempt 1: Installing XGR and deps via Bioconductor...")
        if( ! "XGR" %in% row.names(installed.packages()) ){
            try({
                BiocManager::install("hfang-bristol/XGR", dependencies=T)
            })
        }
        #### Attempt 2: Try via CRAN ####
        message("+ Attempt 2: Installing XGR and deps via CRAN...")
        if( ! "XGR" %in% row.names(installed.packages()) ){
            try({
                install.packages("XGR", dependencies = T)
            })
        }
        #### Attempt 3: Try via archived ####
        if( ! "XGR" %in% row.names(installed.packages()) ){
            #### Archived version ####
            message("+ Attempt 3: Installing XGR and deps via Archives...")
            if( ! "foreign" %in% row.names(installed.packages()) ){
                install.packages("https://cran.r-project.org/src/contrib/Archive/foreign/foreign_0.8-76.tar.gz", dependencies = T)
            }
            if( ! "refGenome" %in% row.names(installed.packages()) ){
                install.packages("https://cran.r-project.org/src/contrib/Archive/refGenome/refGenome_1.7.7.tar.gz", dependencies = T)
            }

            install.packages("https://cran.r-project.org/src/contrib/Archive/XGR/XGR_1.1.7.tar.gz", dependencies = T)
            ### Above dependencies previously specified in DESCRIPTION as follows.
            ### However, removed from DESCRIPTION because these were causing serious issues with users' ability to install echolocatoR at all.
            # url::https://cran.r-project.org/src/contrib/Archive/foreign/foreign_0.8-76.tar.gz,
            # url::https://cran.r-project.org/src/contrib/Archive/refGenome/refGenome_1.7.7.tar.gz,
            # url::https://cran.r-project.org/src/contrib/Archive/XGR/XGR_1.1.7.tar.gz
        }
    }
samkleeman1 commented 2 years ago

Most of these errors messages are ongoing

+++ Multi-finemap:: POLYFUN_SUSIE +++
[1] "PolyFun:: Preparing SNP input file..."
[1] "+ PolyFun:: 34 SNPs identified."
[1] "+ PolyFun:: Writing SNP file ==> /mnt/grid/janowitz/rdata/varicose/finemap/GWAS/ukb_wiberg-vv/FAM13A/PolyFun/snps_to_finemap.txt.gz"
[1] "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/bin/python /grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/R/library/echolocatoR/tools/polyfun//extract_snpvar.py --snps /mnt/grid/janowitz/rdata/varicose/finemap/GWAS/ukb_wiberg-vv/FAM13A/PolyFun/snps_to_finemap.txt.gz --out /mnt/grid/janowitz/rdata/varicose/finemap/GWAS/ukb_wiberg-vv/FAM13A/PolyFun/snps_with_priors.snpvar.tsv.gz"
Traceback (most recent call last):
  File "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/R/library/echolocatoR/tools/polyfun//extract_snpvar.py", line 32, in <module>
    df_snps = pd.read_parquet(args.snps)
  File "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/python3.9/site-packages/pandas/io/parquet.py", line 495, in read_parquet
    return impl.read(
  File "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/python3.9/site-packages/pandas/io/parquet.py", line 239, in read
    result = self.api.parquet.read_table(
  File "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/python3.9/site-packages/pyarrow/parquet.py", line 1905, in read_table
    dataset = _ParquetDatasetV2(
  File "/grid/wsbs/home_norepl/skleeman/miniconda3/envs/echoR/lib/python3.9/site-packages/pyarrow/parquet.py", line 1711, in __init__
    [fragment], schema=fragment.physical_schema,
  File "pyarrow/_dataset.pyx", line 978, in pyarrow._dataset.Fragment.physical_schema.__get__
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
[1] "+ FINEMAP:: Importing conditional probabilities (.cred)..."
Error in t(rsids)[, 1] : subscript out of bounds
In addition: Warning message:
In data.table::fread(cred_path, na.strings = c("<NA>", "NA"), nThread = 1) :
  Detected 9 column names but the data has 3 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.