RajLabMSSM / echofinemap

echoverse module: Statistical and functional fine-mapping functions.
2 stars 1 forks source link

Error when running COJO - cojo.jma.cojo not found #5

Closed AMCalejandro closed 2 years ago

AMCalejandro commented 2 years ago

Hi,

I got the following error when trying to run COJO

+++ Multi-finemap:: COJO +++
[1] "+ COJO:: conditional analysis -- Conditioning on: rs4721411"
sh: 1: /home/acarrasco/.conda/envs/echoR/lib/R/library/echolocatoR/tools/gcta_1.92.1beta5_mac/bin/gcta64: Exec format error
[1] "+ COJO:: Processing results..."
Error in data.table::fread(file.path(cojo_dir, "cojo.jma.cojo")) : 
  File '/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/COJO/cojo.jma.cojo' does not exist or is non-readable. getwd()=='/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/SCRIPTS'

I was looking at the code but I am not sure in which step "cojo.jma.cojo" file is generated. Does it come from running COJO? In that case, it may be solved with RajLabMSSM/echofinemap#11

AMCalejandro commented 2 years ago

Once RajLabMSSM/echofinemap#11 is sorted out this is the log of the COJO run ( Note I am attaching the full log from the finemap_loci() run)

nohup: ignoring input
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Bioconductor version '3.12' is out-of-date; the current release version '3.14'
  is available with R version '4.1'; see https://bioconductor.org/install
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5     ✔ purrr   0.3.4
✔ tibble  3.1.6     ✔ dplyr   1.0.7
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: 'data.table'

The following objects are masked from 'package:dplyr':

    between, first, last

The following object is masked from 'package:purrr':

    transpose

[1] "+ CONDA:: Activating conda env 'echoR'"
[1] "Checking for tabix installation..."
[1] "Checking for bcftools installation..."

)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
MAD1L1 (1 / 1)
)   )  ) ))))))}}}}}}}} {{{{{{{{{(((((( (  (   (
[1] "Determining chrom type from file header"
[1] "LD:: Standardizing summary statistics subset."
[1] "++ Preparing Gene col"
[1] "++ Preparing A1,A1 cols"
[1] "++ Preparing MAF,Freq cols"
[1] "++ Inferring MAF from frequency column..."
[1] "++ Preparing N_cases,N_controls cols"
[1] "++ Preparing `proportion_cases` col"
[1] "++ 'proportion_cases' not included in data subset."
[1] "++ Preparing N col"
[1] "+ Preparing sample_size (N) column"
[1] "++ Using `sample_size = 3572` TRUE"
[1] "++ Preparing t-stat col"
[1] "+ Calculating t-statistic from Effect and StdErr..."
[1] "++ Assigning lead SNP"
[1] "++ Ensuring Effect, StdErr, P are numeric"
[1] "++ Ensuring 1 SNP per row"
[1] "++ Removing extra whitespace"
[1] "++ Saving subset ==> /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/MAD1L1_axial_TryingCOJO_subset.tsv.gz"
[1] "LD:: Using UK Biobank LD reference panel."
[1] "+ UKB LD file name: chr7_1000001_4000001"
[1] "+ LD:: Downloading full .gz/.npz UKB files and saving to disk."
[1] "+ Overwriting pre-existing file."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ Overwriting pre-existing file."
[1] "+ CONDA:: Identified axel executable in echoR env."
[1] "+ LD:: load_ld() python function input: /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/chr7_1000001_4000001"
[1] "+ LD:: Reading LD matrix into memory. This could take some time..."
Processed URL: /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/chr7_1000001_4000001
Some other message at the end
[1] "+ Full UKB LD matrix: 30690 x 30690"
[1] "+ Full UKB LD SNP data.table: 30690 x 5"
[1] "+ LD:: Saving LD matrix ==> /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/MAD1L1.UKB_LD.RDS"
[1] "5872 x 5872 LD_matrix (sparse)"
vvvvv-- SUSIE --vvvvv
✅ All required columns present.
✅ All suggested columns present.
vvvvv-- COJO --vvvvv
✅ All required columns present.
✅ All suggested columns present.

+++ Multi-finemap:: SUSIE +++
For large R or large XtX, consider installing the  Rfast package for better performance.

+++ Multi-finemap:: COJO +++
[1] "+ COJO:: conditional analysis -- Conditioning on: rs4721411"
*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.2 beta Linux
* (C) 2010-present, Jian Yang, The University of Queensland
* Please report bugs to Jian Yang <jian.yang.qt@gmail.com>
*******************************************************************
Analysis started at 15:47:24 UTC on Wed Feb 23 2022.
Hostname: kronos

Error: --bfile /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/plink.fam not found
An error occurs, please check the options or data
[1] "+ COJO:: Processing results..."
Error in data.table::fread(file.path(cojo_dir, "cojo.jma.cojo")) : 
  File '/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/COJO/cojo.jma.cojo' does not exist or is non-readable. getwd()=='/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/SCRIPTS'
In addition: Warning message:
In susie_func(bhat = subset_DT$Effect, shat = subset_DT$StdErr,  :
  IBSS algorithm did not converge in 100 iterations!
                  Please check consistency between summary statistics and LD matrix.
                  See https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html
Error in `[.data.table`(x, r, vars, with = FALSE) : 
  column(s) not found: Conditioned_Effect
Fine-mapping complete in:
Time difference of 1.5 hours
Error in `[.data.table`(x, r, vars, with = FALSE) : 
  column(s) not found: SNP
/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/chr7_1000001_4000001.gz
/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022/mixedmodels_GWAS/axial_TryingCOJO/MAD1L1/LD/chr7_1000001_4000001.npz

And this is the code calling for calling the function


#!usr/bin/Rscript

library(echolocatoR)  
library(tidyverse)
library(data.table)                                                             

fullSS_path <- "/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/for_echolocatoR_axialOutcome_3.tsv"                                                                    
fullRS_path <- "/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_23.2.2022"

top_SNPs = fread("../topSNPs_axialMotorSymptom.txt") 
top_SNPs = top_SNPs[6, ]

earlymotor_PD = finemap_loci(top_SNPs = top_SNPs,
                                              loci = top_SNPs$Locus, 
                                              dataset_name = "axial_TryingCOJO", 
                                              dataset_type = "mixedmodels_GWAS",   
                                              force_new_subset = T, 
                                              force_new_LD = T, 
                                              force_new_finemap = T, 
                                              remove_tmps = F, 
                     # SUMMARY STATS ARGUMENTS 
                     fullSS_path = fullSS_path,
                     results_dir = fullRS_path,
                     query_by = "tabix", 
                     chrom_col = "CHR", position_col = "POS", snp_col = "#MarkerName", 
                     pval_col = "Pval", effect_col = "Effect", stderr_col = "StdErr", 
                     freq_col = "medianFreq", MAF_col = "calculate", 
                     A1_col = "Allele1", 
                     A2_col = "Allele2", 
                     #N_cases_col = "TotalSampleSize",
                     #N_controls = 0,

                     # FILTERING ARGUMENTS 
                     ## It's often desirable to use a larger window size  
                     ## (e.g. 2Mb which is bp_distance=500000*2),  
                     ## but we use a small window here to speed up the process.  
                     bp_distance = 500000*2, 
                     min_MAF = 0.001,   
                     trim_gene_limits = F, 

                     # FINE-MAPPING ARGUMENTS 
                     ## General 
                     finemap_methods = c("SUSIE","COJO"),  
                     n_causal = 5, 
                     PP_threshold = .95,  

                     # LD ARGUMENTS  
                     LD_reference = "UKB", 
                     superpopulation = "EUR", 
                     download_method = "axel", 

                     # Additional arguments - My arguments
                     case_control = F,
                     nThread = 15,
                     sample_size = 3572,

                     #COJO PARAMS
                     conditioned_snps = "rs4721411",
                     # PLOT ARGUMENTS  
                     ## general    
                     plot.types=c("fancy"), 
                     ## Generate multiple plots of different window sizes;  
                     ### all SNPs, 4x zoomed-in, and a 50000bp window 
                     plot.zoom = c("all","4x","10x"), 
                     ## XGR 
                     # plot.XGR_libnames=c("ENCODE_TFBS_ClusteredV3_CellTypes"),  
                     ## Roadmap 
                     plot.Roadmap = F, 
                     plot.Roadmap_query = NULL, 
                     # Nott et al. (2019) 
                     plot.Nott_epigenome = T,  
                     plot.Nott_show_placseq = T,  

                     verbose = F 
                     )

As well as the session info

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/acarrasco/.conda/envs/echoR/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.14.2 forcats_0.5.1     stringr_1.4.0     dplyr_1.0.7       purrr_0.3.4       readr_2.1.2       tidyr_1.2.0       tibble_3.1.6      ggplot2_3.3.5     tidyverse_1.3.1   echolocatoR_0.1.2

loaded via a namespace (and not attached):
  [1] readxl_1.3.1                backports_1.4.1             Hmisc_4.5-0                 BiocFileCache_1.14.0        plyr_1.8.6                  lazyeval_0.2.2              splines_4.0.3              
  [8] BiocParallel_1.24.1         GenomeInfoDb_1.26.7         digest_0.6.29               ensembldb_2.14.1            htmltools_0.5.2             fansi_1.0.2                 magrittr_2.0.2             
 [15] checkmate_2.0.0             memoise_2.0.1               BSgenome_1.58.0             cluster_2.1.1               tzdb_0.2.0                  Biostrings_2.58.0           modelr_0.1.8               
 [22] matrixStats_0.61.0          ggbio_1.38.0                askpass_1.1                 prettyunits_1.1.1           jpeg_0.1-8.1                colorspace_2.0-2            rvest_1.0.2                
 [29] blob_1.2.2                  rappdirs_0.3.3              haven_2.4.3                 xfun_0.29                   crayon_1.4.2                RCurl_1.98-1.5              jsonlite_1.7.3             
 [36] graph_1.68.0                Exact_2.1                   survival_3.2-11             VariantAnnotation_1.36.0    glue_1.6.1                  gtable_0.3.0                zlibbioc_1.36.0            
 [43] XVector_0.30.0              DelayedArray_0.16.3         BiocGenerics_0.36.1         scales_1.1.1                mvtnorm_1.1-1               DBI_1.1.2                   GGally_2.1.1               
 [50] Rcpp_1.0.8                  progress_1.2.2              htmlTable_2.1.0             foreign_0.8-81              bit_4.0.4                   proxy_0.4-25                OrganismDbi_1.32.0         
 [57] Formula_1.2-4               stats4_4.0.3                htmlwidgets_1.5.3           httr_1.4.2                  RColorBrewer_1.1-2          ellipsis_0.3.2              pkgconfig_2.0.3            
 [64] reshape_0.8.8               XML_3.99-0.6                nnet_7.3-15                 dbplyr_2.1.1                utf8_1.2.2                  tidyselect_1.1.1            rlang_1.0.1                
 [71] reshape2_1.4.4              AnnotationDbi_1.52.0        munsell_0.5.0               cellranger_1.1.0            tools_4.0.3                 cachem_1.0.6                cli_3.1.1                  
 [78] generics_0.1.2              RSQLite_2.2.9               broom_0.7.12                fastmap_1.1.0               fs_1.5.2                    knitr_1.37                  bit64_4.0.5                
 [85] AnnotationFilter_1.14.0     rootSolve_1.8.2.1           RBGL_1.66.0                 xml2_1.3.3                  biomaRt_2.46.3              compiler_4.0.3              rstudioapi_0.13            
 [92] curl_4.3                    png_0.1-7                   e1071_1.7-6                 reprex_2.0.1                DescTools_0.99.41           stringi_1.7.6               GenomicFeatures_1.42.3     
 [99] lattice_0.20-41             ProtGenerics_1.22.0         Matrix_1.3-2                vctrs_0.3.8                 pillar_1.7.0                lifecycle_1.0.1             BiocManager_1.30.12        
[106] bitops_1.0-7                lmom_2.8                    rtracklayer_1.50.0          GenomicRanges_1.42.0        R6_2.5.1                    latticeExtra_0.6-29         gridExtra_2.3              
[113] IRanges_2.24.1              gld_2.6.2                   dichromat_2.0-0             boot_1.3-27                 MASS_7.3-53.1               assertthat_0.2.1            SummarizedExperiment_1.20.0
[120] openssl_1.4.6               withr_2.4.3                 GenomicAlignments_1.26.0    Rsamtools_2.6.0             S4Vectors_0.28.1            GenomeInfoDbData_1.2.4      expm_0.999-6               
[127] parallel_4.0.3              hms_1.1.1                   grid_4.0.3                  rpart_4.1-15                class_7.3-18                MatrixGenerics_1.2.1        biovizBase_1.38.0          
[134] Biobase_2.50.0              lubridate_1.8.0             base64enc_0.1-3            

r$>  

bschilder commented 2 years ago

Thanks for letting me know about this! I will look into this along with RajLabMSSM/echofinemap#11. It's possible this relates to version differences in COJO. WHen I wrote this code I was using: gcta_1.92.1beta5_mac

bschilder commented 2 years ago

@AMCalejandro I've rewritten and just pushed much of the COJO code to make it easier to run and understand. Try reinstalling echofinemap and check out the documentation ?echofinemap::COJO.

It also automatically installs the GCTA-COJO executable (v1.93.2) via genetics.binaRies, so you don't have to worry about version differences.