brentp / combined-pvalues

combining p-values using modified stouffer-liptak for spatially correlated results (probes)
MIT License
44 stars 21 forks source link

Missing/broken bedtools when running on HPC #41

Open krferrier opened 4 weeks ago

krferrier commented 4 weeks ago

I do the majority of my work on a remote HPC where I don't have superuser privileges and I am often limited in the ability to download or install software from urls. I installed comb-p in a custom conda environment with python 2.7, cruzdb, numpy, scipy, and toolshed. With this environment active, I ran:

comb-p pipeline -c 4 \
    --seed 5e-8 \
    --dist 1000 \
    -p dmr \
    --region-filter-p 0.05 \
    --anno hg38 \
    ewas_results.bed

and got the following output:

setting --acf-dist to 0.33 * --dist == 330
calculated stepsize as: 330
ACF:
 #lag_min       lag_max correlation     N       p
1       331     0.05996 746851  0
wrote: results_COT/dmr/COTconc_ewas.acf.txt

# original lambda: 1.10
wrote: dmr.bed.gz with lambda: 1.18
wrote: dmr.fdr.bed.gz
wrote: dmr.bed.gz (9 regions)
# read 9 regions from dmr.regions.bed.gz
# calculating ACF out to: 980
#           with 4  lags: [1, 331, 661, 991]
# Done with one-time ACF calculation
1468657 bases used as coverage for sidak correction
wrote: dmr.regions-p.bed.gz, (regions with corrected-p < 0.05: 8)
cmd was:bedtools intersect -b /tmp/tmpDa5ThY                          -a dmr.regions-p.bed.gz -wo 
return code was:255
Exception toolshed.files.ProcessException: ProcessException('bedtools intersect -b /tmp/tmpDa5ThY                          -a dmr.regions-p.bed.gz -wo ',) in <generator object process_iter at 0x7f5740c2dc80> ignored
Traceback (most recent call last):
  File "/master/kferrier/micromamba/envs/dmr/bin/comb-p", line 39, in <module>
    main()
  File "/master/kferrier/micromamba/envs/dmr/bin/comb-p", line 36, in main
    module.main()
  File "/master/kferrier/micromamba/envs/dmr/lib/python2.7/site-packages/cpv/pipeline.py", line 79, in main
    use_fdr=not args.no_fdr)
  File "/master/kferrier/micromamba/envs/dmr/lib/python2.7/site-packages/cpv/pipeline.py", line 185, in pipeline
    regions_bed, p_col_name=col_num)):
  File "/master/kferrier/micromamba/envs/dmr/lib/python2.7/site-packages/cpv/filter.py", line 95, in filter
    header=rh + ph), itemgetter('chrom','start','end')):
KeyError: 'start'

I'm guessing that bedtools is supposed to be installed/used via url, which is why I ran into errors running comb-p pipeline on the HPC, but had no problem running with the same conda environment locally.

SOLUTION: I used conda to install bedtools in the custom environment on the HPC (mamba install bioconda::bedtools) and then the command ran smoothly.

krferrier commented 4 weeks ago

The following is the conda yaml for the custom environment I built for using comb-p in case it is helpful for others in a similar situation. I commented out some R packages that were necessary for the R script I wrote to convert my results into bed format, but aren't necessary for running comb-p.

name: dmr
channels:
  - travis
  - defaults
  - conda-forge
  - bioconda
  - nodefaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - _r-mutex=1.0.1=anacondar_1
  - backports=1.1=pyhd3eb1b0_1
  - backports.functools_lru_cache=1.6.4=pyhd3eb1b0_0
  - backports_abc=0.5=py_1
  - bedtools=2.31.1=hf5e1c6e_2
  - binutils_impl_linux-64=2.36.1=h193b22a_2
  - binutils_linux-64=2.36=hf3e587d_33
  - blas=1.0=openblas
  - bwidget=1.9.14=ha770c72_1
  - bzip2=1.0.8=h5eee18b_6
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2024.8.30=hbcca054_0
  - cairo=1.16.0=hb05425b_5
  - certifi=2020.6.20=pyhd3eb1b0_3
  - combined-pvalues=0.50.6=pyhdfd78af_0
  - cruzdb=0.5.4=py27_0
  - curl=7.76.1=h979ede3_1
  - cycler=0.10.0=py27_0
  - dbus=1.13.18=hb2f20db_0
  - expat=2.6.3=h6a678d5_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_3
  - fontconfig=2.14.1=h52c9d5c_1
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - freetype=2.12.1=h4a9f257_0
  - fribidi=1.0.10=h36c2ea0_0
  - functools32=3.2.3.2=py27_1
  - futures=3.3.0=py27_0
  - gcc_impl_linux-64=7.5.0=habd7529_20
  - gcc_linux-64=7.5.0=h47867f9_33
  - gfortran_impl_linux-64=7.5.0=h56cb351_20
  - gfortran_linux-64=7.5.0=h78c8a43_33
  - glib=2.78.4=h6a678d5_0
  - glib-tools=2.78.4=h6a678d5_0
  - graphite2=1.3.14=h295c915_1
  - gsl=2.4=h294904e_1006
  - gst-plugins-base=1.14.1=h6a678d5_1
  - gstreamer=1.14.1=h5eee18b_1
  - gxx_impl_linux-64=7.5.0=hd0bb8aa_20
  - gxx_linux-64=7.5.0=h555fc39_33
  - harfbuzz=4.3.0=hf52aaf7_1
  - icu=58.2=he6710b0_3
  - interlap=0.2.7=pyh9f0ad1d_0
  - jbig=2.1=h7f98852_2003
  - jpeg=9e=h5eee18b_3
  - kernel-headers_linux-64=3.10.0=he073ed8_17
  - kiwisolver=1.1.0=py27he6710b0_0
  - krb5=1.17.2=h926e7f8_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=2.2.1=h9c3ff4c_0
  - libblas=3.9.0=13_linux64_openblas
  - libcblas=3.9.0=13_linux64_openblas
  - libcurl=7.76.1=hc4aaa36_1
  - libdeflate=1.7=h7f98852_5
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.4=h6a678d5_1
  - libgcc=14.2.0=h77fa898_1
  - libgcc-devel_linux-64=7.5.0=hda03d7c_20
  - libgcc-ng=14.2.0=h69a702a_1
  - libgfortran-ng=7.5.0=ha8ba4b0_17
  - libgfortran4=7.5.0=ha8ba4b0_17
  - libglib=2.78.4=hdc74915_0
  - libgomp=14.2.0=h77fa898_1
  - libiconv=1.16=h5eee18b_3
  - libnghttp2=1.43.0=h812cca2_1
  - libopenblas=0.3.18=hf726d26_0
  - libpng=1.6.39=h5eee18b_0
  - libssh2=1.10.0=ha56f1ee_2
  - libstdcxx=14.2.0=hc0a3c3a_1
  - libstdcxx-devel_linux-64=7.5.0=hb016644_20
  - libstdcxx-ng=14.2.0=h4852527_1
  - libtiff=4.3.0=hf544144_1
  - libuuid=1.41.5=h5eee18b_0
  - libwebp-base=1.2.2=h7f98852_1
  - libxcb=1.15=h7f8727e_0
  - libxml2=2.9.14=h74e7548_0
  - libzlib=1.2.13=h4ab18f5_6
  - lz4-c=1.9.3=h9c3ff4c_1
  - make=4.3=hd18ef5c_1
  - matplotlib=2.2.4=py27_0
  - matplotlib-base=2.2.4=py27hfd891ef_0
  - mysql-connector-c=6.1.11=h24aacaa_2
  - mysql-python=1.2.5=py27h7b6447c_0
  - ncurses=6.4=h6a678d5_0
  - numpy=1.16.6=py27h30dfecb_0
  - numpy-base=1.16.6=py27h2f8d375_0
  - openblas=0.3.4=h9ac9557_1000
  - openssl=1.1.1w=h7f8727e_0
  - pango=1.50.7=hbd2fdc8_0
  - pcre=8.45=h9c3ff4c_0
  - pcre2=10.42=hebb0a14_1
  - pip=19.3.1=py27_0
  - pixman=0.40.0=h36c2ea0_0
  - pyparsing=2.4.7=pyhd3eb1b0_0
  - pyqt=5.9.2=py27h05f1152_2
  - python=2.7.18=h42bf7aa_3
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - pytz=2021.3=pyhd3eb1b0_0
  - qt=5.9.7=h5867ecd_1
  #- r-argparse=2.0.3=r36h142f84f_0
  #- r-assertthat=0.2.1=r36h6115d3f_2
  #- r-base=3.6.1=haffb61f_2
  #- r-cli=2.5.0=r36hc72bb7e_0
  #- r-crayon=1.4.1=r36hc72bb7e_0
  #- r-data.table=1.14.0=r36hcfec24a_0
  #- r-dplyr=1.0.6=r36h03ef668_1
  #- r-ellipsis=0.3.2=r36hcfec24a_0
  #- r-fansi=0.4.2=r36hcfec24a_0
  #- r-findpython=1.0.7=r36hc72bb7e_0
  #- r-generics=0.1.0=r36hc72bb7e_0
  #- r-getopt=1.20.3=r36_2
  #- r-glue=1.4.2=r36hcfec24a_0
  #- r-jsonlite=1.7.2=r36hcfec24a_0
  #- r-lifecycle=1.0.0=r36hc72bb7e_0
  #- r-magrittr=2.0.1=r36hcfec24a_1
  #- r-pillar=1.6.1=r36hc72bb7e_0
  #- r-pkgconfig=2.0.3=r36h6115d3f_1
  #- r-purrr=0.3.4=r36hcfec24a_1
  #- r-r.methodss3=1.8.1=r36h6115d3f_0
  #- r-r.oo=1.24.0=r36h6115d3f_0
  #- r-r.utils=2.10.1=r36h6115d3f_0
  #- r-r6=2.5.0=r36hc72bb7e_0
  #- r-rlang=0.4.11=r36hcfec24a_0
  #- r-tibble=3.1.2=r36hcfec24a_0
  #- r-tidyselect=1.1.1=r36hc72bb7e_0
  #- r-utf8=1.2.1=r36hcfec24a_0
  #- r-vctrs=0.3.8=r36hcfec24a_1
  - readline=8.2=h5eee18b_0
  - scipy=1.2.1=py27he2b7bc3_0
  - setuptools=44.0.0=py27_0
  - singledispatch=3.7.0=pyhd3eb1b0_1001
  - sip=4.19.8=py27hf484d3e_0
  - six=1.16.0=pyhd3eb1b0_1
  - sqlalchemy=1.3.12=py27h7b6447c_0
  - sqlite=3.45.3=h5eee18b_0
  - subprocess32=3.5.4=py27h7b6447c_0
  - sysroot_linux-64=2.17=h4a8ded7_17
  - tk=8.6.14=h39e8969_0
  - tktable=2.10=hb7b940f_3
  - toolshed=0.4.6=pyh864c0ab_3
  - tornado=5.1.1=py27h7b6447c_0
  - tzdata=2024b=hc8b5060_0
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.4.6=h5eee18b_1
  - zlib=1.2.13=h4ab18f5_6
  - zstd=1.5.0=ha95c52a_0
prefix: /master/kferrier/micromamba/envs/dmr