AnnabelPerry / Polly

GNU General Public License v3.0
0 stars 0 forks source link

R CMD INSTALL will not run #4

Closed AnnabelPerry closed 3 years ago

AnnabelPerry commented 3 years ago

Ok, after looking into this a bit more, I think RStudio and command-line R have two distinct issues.

RStudio seems to be wiping the HTSlib Makevars flags and is thus unable to find HTSlib. Command-line R seems able to find HTSlib but unable to install Polly (maybe I don't have permission to install to Grace..?).

Here's my rationale for this diagnosis:

Issues with Command-Line R:

  1. HTSlib is "discoverable" to pkg-config (see output of two commands below)
    [annabelperry@grace1 pkgconfig]$ echo $PKG_CONFIG_PATH
    /sw/eb/sw/HTSlib/1.11-GCC-10.2.0/lib/pkgconfig:/sw/eb/sw/R/4.0.3-intel-2020b/lib/pkgconfig:/sw/eb/sw/ImageMagick/7.0.10-35-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/LittleCMS/2.11-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/JasPer/2.0.14-GCCcore-10.2.0/lib64/pkgconfig:/sw/eb/sw/GSL/2.6-iccifort-2020.4.304/lib/pkgconfig:/sw/eb/sw/HDF5/1.10.7-iimpi-2020b/lib/pkgconfig:/sw/eb/sw/ICU/67.1-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libsndfile/1.0.28-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/FFTW/3.3.8-intel-2020b/lib/pkgconfig:/sw/eb/sw/NLopt/2.6.2-GCCcore-10.2.0/lib64/pkgconfig:/sw/eb/sw/GMP/6.2.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/cURL/7.72.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/Tk/8.6.10-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/LibTIFF/4.1.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libjpeg-turbo/2.0.5-GCCcore-10.2.0/lib64/pkgconfig:/sw/eb/sw/PCRE2/10.35-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/SQLite/3.33.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/Tcl/8.6.10-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libreadline/8.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/cairo/1.16.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/GLib/2.66.1-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/PCRE/8.44-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libffi/3.3-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/pixman/0.40.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libGLU/9.0.1-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/Mesa/20.2.1-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libunwind/1.4.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libglvnd/1.3.2-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libdrm/2.4.102-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/zstd/1.4.5-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/lz4/1.9.2-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/X11/20201008-GCCcore-10.2.0/share/pkgconfig:/sw/eb/sw/X11/20201008-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/xorg-macros/1.19.2-GCCcore-10.2.0/share/pkgconfig:/sw/eb/sw/fontconfig/2.13.92-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/util-linux/2.36-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/freetype/2.10.3-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libpng/1.6.37-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/expat/2.2.9-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/bzip2/1.0.8-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/imkl/2020.4.304-iimpi-2020b/mkl/bin/pkgconfig:/sw/eb/sw/OpenMPI/4.0.5-GCC-10.2.0/lib/pkgconfig:/sw/eb/sw/libfabric/1.11.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/UCX/1.9.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libevent/2.1.12-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/hwloc/2.2.0-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libpciaccess/0.16-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/libxml2/2.9.10-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/XZ/5.2.5-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/numactl/2.0.13-GCCcore-10.2.0/lib/pkgconfig:/sw/eb/sw/zlib/1.2.11-GCCcore-10.2.0/lib/pkgconfig
    [annabelperry@grace1 pkgconfig]$ pkg-config --list-all
    libnl-xfrm-3.0            libnl-xfrm - Netlink Routing Family Library
    QtTest                    Qttest - Qt Unit Testing Library
    intel-gen4asm             intel-gen4asm - An assembler compiler for the Intel 965+ Chipset
    pthread-stubs             pthread stubs - Meta package for pthread symbols - defaults to heavyweight ones if the C runtime does not provide lightweight ones.
    libpcre                   libpcre - PCRE - Perl compatible regular expressions C library with 8 bit character support
    xfont                     Xfont - X font Library
    hdf5                      HDF5 - Hierarchical Data Format 5 (HDF5)
    xcb-glx                   XCB GLX - XCB GLX Extension
    QtXml                     Qtxml - Qtxml Library
    htslib                    htslib - C library for high-throughput sequencing data formats
    xpm                       Xpm - X Pixmap Library
    xcb-randr                 XCB RandR - XCB RandR Extension
    osmesa                    osmesa - Mesa Off-screen Rendering Library
    dmxproto                  DMXProto - DMX extension headers
    python-2.7                Python - Python library
    cairo-xlib-xcb            cairo-xlib-xcb - Xlib/XCB functions for cairo graphics library
    slurm                     slurm - Slurm API
    udev                      udev - udev
    fontconfig                Fontconfig - Font configuration and customization library
    xft                       Xft - X FreeType library
    libfabric                 libfabric - OFI-WG libfabric
    xcb-damage                XCB Damage - XCB Damage Extension
    libunwind-setjmp          libunwind-setjmp - libunwind setjmp library
    libunwind-generic         libunwind-generic - libunwind generic library
    xmu                       Xmu - Xmu Library
    xcb-dpms                  XCB DPMS - XCB DPMS Extension
    xmuu                      Xmuu - Mini Xmu Library
    gsl                       GSL - GNU Scientific Library
    libva-drm                 libva-drm - Userspace Video Acceleration (VA) drm interface
    libevent                  libevent - libevent is an asynchronous notification event loop library
    Variable 'MKLROOT' not defined in '/sw/eb/sw/imkl/2020.4.304-iimpi-2020b/mkl/bin/pkgconfig/mkl-static-ilp64-iomp.pc'

2. When I run "R CMD check" on the tar.gz file generated using "R CMD build Polly", I get this error:

[annabelperry@grace1 ~]$ R CMD check -l$SCRATCH/R/library-4.0.3 Polly_0.1.0.tar.gz 
* using R version 4.0.3 (2020-10-10)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
 OK
* checking extension type ... Package
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
 ERROR
Installation failed.
* DONE

Status: 1 ERROR
See

Since the dependency check passes successfully, I think command-line R can "find" HTSlib.

  1. When I run "R CMD INSTALL", I don't get any output at all. This makes me think that something is amiss with my permission to install to Grace

    [annabelperry@grace1 ~]$ R CMD INSTALL -l$SCRATCH/R/library-4.0.3 Polly_0.1.0.tar.gz 
    [annabelperry@grace1 ~]$
  2. As an aside, when I untar the tar.gz file, I see all the expected files:

    [annabelperry@grace1 ~]$ tar -vxf Polly_0.1.0.tar.gz 
    Polly/DESCRIPTION
    Polly/NAMESPACE
    Polly/R/
    Polly/R/Polly-package.R
    Polly/R/RcppExports.R
    Polly/cleanup
    Polly/configure
    Polly/configure.ac
    Polly/man/
    Polly/man/MicroGenotyper.Rd
    Polly/man/Polly.Rd
    Polly/man/PollyMicros.Rd
    Polly/man/PollySI.Rd
    Polly/src/
    Polly/src/Makevars.in
    Polly/src/Polly.cpp
    Polly/src/Polly.h
    Polly/src/RcppExports.cpp

One possible explanation is that there is an issue with the R/4.0.3 module installed on Grace. I noticed an error associated with "mkl-static-ilp64-iomp.pc" in the output of "pkg-config --list-all". Since R/4.1.0 relies upon this version of impi, I went ahead and tried to fix it using the following command I found on StackOverflow:

[annabelperry@grace1 pkgconfig]$ export PKG_CONFIG_MKL_STATIC_ILP64_IOMP_MKLROOT=$MKLROOT 

However, this command did not fix the error, telling me that something is wrong with my ability to execute commands from the Grace command line.

Issues with RStudio:

I only see an "htslib was not found" error if I check my package using "devtools::check()" from RStudio. Additionally, when I run "devtools::check()" from RStudio, the HTS_CFLAGS and HTS_LIBS are wiped from the "src/Makevars" file I generated using "./configure". This does not happen if I run "R CMD check". So, RStudio (for whatever reason) wipes the "src/Makevars" flags and that's why it can't find htslib.

I don't really care if the issue with RStudio is resolved - I just need to be able to run my Polly commands using command-line R.

eddelbuettel commented 3 years ago

I really like slurm. I used it ~ 15 years ago when we did not really have multi-core systems and needed slurm for parallel work.

So these days I presume you use Slurm possibly even without MPI "just" as a resource managers? Or also for explicit parallelism (which is one to two levels harder to code...).

AnnabelPerry commented 3 years ago

I can't say that I know what MPI is - I've seen it on the supercomputer help page in relation to SLURM, but they don't include information on what MPI is. I know we can use slurm on our supercomputers to submit parallel jobs, but I don't really know what that is either.

eddelbuettel commented 3 years ago

(Yup. MPI is one very Comp-Sciency concept for parallel computing. No need to worry now. Back in the day use slurm as front-end to MPI jobs but you can of course use it "just" to launch jobs in batch and have manage the resources on the big computer -- that is slurm's job.)

AnnabelPerry commented 3 years ago

I'm battling more supercomputer demons this morning - I'm getting segmentation faults when I run jobs to collect runtime info from the MicroGenotyper() function:


 *** caught segfault ***
address 0x202568ca527, cause 'memory not mapped'

Traceback:
 1: MicroGenotyper(bams, "/scratch/user/annabelperry/PollyRuntimes/InputFiles/Edited_Birch_Lookup_Table.csv",     scaffold_vector, output_names)
An irrecoverable exception occurred. R is aborting now ...
/sw/hprc/sw/R_tamu/bin/Rscript: line 75: 102073 Segmentation fault      (core dumped) ${EBROOTR}/bin/Rscript ${ARGS[@]}
rm: cannot remove 'aligned_SRR6511793.bam': No such file or directory

When I run the seff command on the slurm jobs, it shows the job has not used all the requested memory:

[annabelperry@grace1 5]$ seff 719533 
Job ID: 719533
Cluster: grace
User/Group: annabelperry/annabelperry
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 80
CPU Utilized: 01:55:19
CPU Efficiency: 1.20% of 6-16:21:20 core-walltime
Job Wall-clock time: 02:00:16
Memory Utilized: 1.86 TB
Memory Efficiency: 65.04% of 2.86 TB

Since I am just trying to collect the script's runtime, I delete the input and output directly after running (to save space). The input (aligned_SRR6511793.bam) is in the /scratch/user/annabelperry/PollyRuntimes/InputFiles/ directory, as shown in the R script, but when I called rm in the job script I forgot to include the full directory, so that's why you see a "no such file or directory" error. This is not the cause of the segmentation fault, though, because the segmentation fault occurs while the R script is running, and I use the correct directory in the R script itself (see below)

library("Polly")

setwd("/scratch/user/annabelperry/PollyRuntimes/MicroGenotyper")

scaffold_vector <- c("ScyDAA6_1508_HRSCAF_1794", "ScyDAA6_1196_HRSCAF_1406",
                     "ScyDAA6_5987_HRSCAF_6712", "ScyDAA6_8_HRSCAF_51",
                     "ScyDAA6_1107_HRSCAF_1306", "ScyDAA6_2393_HRSCAF_2888",
                     "ScyDAA6_1592_HRSCAF_1896", "ScyDAA6_1439_HRSCAF_1708",
                     "ScyDAA6_1854_HRSCAF_2213", "ScyDAA6_10_HRSCAF_60",
                     "ScyDAA6_11_HRSCAF_73", "ScyDAA6_695_HRSCAF_847",
                     "ScyDAA6_1934_HRSCAF_2318", "ScyDAA6_5078_HRSCAF_5686",
                     "ScyDAA6_5984_HRSCAF_6694", "ScyDAA6_2469_HRSCAF_2980",
                     "ScyDAA6_1473_HRSCAF_1750", "ScyDAA6_5983_HRSCAF_6649",
                     "ScyDAA6_1859_HRSCAF_2221", "ScyDAA6_2_HRSCAF_26",
                     "ScyDAA6_7_HRSCAF_50", "ScyDAA6_2113_HRSCAF_2539",
                     "ScyDAA6_2188_HRSCAF_2635", "ScyDAA6_932_HRSCAF_1100")

bams <- c("/scratch/user/annabelperry/PollyRuntimes/InputFiles/aligned_SRR6511793.bam")

output_names <- c("MGR-F4.csv")

ptm <- proc.time()

MicroGenotyper(bams,"/scratch/user/annabelperry/PollyRuntimes/InputFiles/Edited_Birch_Lookup_Table.csv",scaffold_vector,output_names)

MicroGenotyperRunTime <- proc.time() - ptm

print("\nRuntime for Microgenotyper on Bam File 4: ")

print(MicroGenotyperRunTime)

The sysadmins are shutting Grace down for maintenance all day tomorrow. Hopefully this is one of the issues they're going to fix.

eddelbuettel commented 3 years ago

Sounds like you want to open a new issue for a new topic and delete this one here? That's exactly what you were thinking isn't it? ;-)

AnnabelPerry commented 3 years ago

Yes haha - I'll do that now