databio / bulker

Manager for multi-container computing environments
https://bulker.io
BSD 2-Clause "Simplified" License
24 stars 2 forks source link

Issues with bulker when submitting jobs on SGE #64

Closed gbloeb closed 4 years ago

gbloeb commented 4 years ago

I have installed pepatac and used the bulker container which works well on my cluster when using interactively on a development node; however when I submit a job on the SGE pepatac can no longer find the software from the container. This was not a problem when I was running pepatac using a singularity container in an older version.

Are there environmental variable I need to pass to help bulker play well with the SGE? Thanks for any advice!

nsheff commented 4 years ago

TLDR: yes, you'd need to use looper run ... --package bulker_sge to get it to activate the bulker crate for the particular instance. But you'd also need to have a bulker_sge compute package set up in divvy.

I actually hadn't configured divvy to use SGE yet -- have you? it would be great if you could contribute those templates back to the divcfg repository. But I can show you how it works for SLURM:

Using SLURM directly, we use this template: https://github.com/pepkit/divcfg/blob/master/templates/slurm_template.sub. We use this with looper by specifiying --package slurm or --package default, if that's how divvy is configured.

Using SLURM with bulker we use this template: https://github.com/pepkit/divcfg/blob/master/templates/slurm_bulker_template.sub. So we specify with --package bulker_slurm

They're almost identical; the idea is that you need to prepend bulker run to the beginning of the pipeline call. Looper uses divvy to construct these submission scripts. So if you've already configured divvy with these templates/packages, all you have to do is say looper run pep.yaml --package bulker_slurm. Since the PEPATAC pipeline interface already knows what crate should be used to compute, this is correctly populated into the SLURM submission script, and everything should just work.

Let me know if that answers your question or if you run into any issues configuring it.

nsheff commented 4 years ago

Wait, are you even using looper? If not, then you can still do this... you'd just need to change your SGE script to use bulker activate or bulker run for your pepatac job.

gbloeb commented 4 years ago

I’m not using looper. I do use bulker activate

After more troubleshooting I realized all works well if running PEPATAC on a single core (-p 1), but when I increase to (-p 2) than PEPATAC will start reporting that commands from the bulker crate are no longer available.

This is not dependent on how many cores are requested in the job script (e.g., if I request 4 cores for the job but only run PEPATAC on a single core all works smoothly).

Thanks, Gabe

On Sep 1, 2020, at 6:31 AM, Nathan Sheffield notifications@github.com wrote:

 Wait, are you even using looper? If not, then you can still do this... you'd just need to change your SGE script to use bulker activate or bulker run for your pepatac job.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

nsheff commented 4 years ago

can you share your sge submission script?

also, can you tell exactly what commands are not available? Maybe if you could just paste the actual run log that would help. if you could do both a successful log (with -p 1) and an unsuccessful one that would be great.

gbloeb commented 4 years ago

FYI, for me this is no longer an issue, I went ahead and installed everything needed to run the pipeline myself so I’m no longer dependent on bulker. When I was doing these tests, the only program that I had not installed was pigz, so pipeline runs regardless, but as you’ll see from the logs below, when I use more than 1 core, pigz is not available. Just copying the beginning of the logs here which specify whether pigz is callable.

My generic SGE script is:

!/bin/env bash

!/bin/bash #-- what is the language of this shell

-- Any line that starts with #$ is an instruction to SGE

$ -S /bin/bash #-- the shell for the job

$ -o ~/log #-- output directory (fill in)

$ -e ~/log #-- error directory (fill in)

$ -cwd #-- tell the job that it should start in your working directory

$ -r y #-- tell the system that if a job crashes, it should be restarted

$ -j y #-- tell the system that the STDERR and STDOUT should be joined

$ -l mem_free=10G

$ -l scratch=100G

$ -l h_rt=16:00:00

$ -pe smp 8

$ -m ea #--email when done

$ -M gabriel.loeb@ucsf.edu #--email

fastqDir=$1 read1file=$2 read2file=$3 sample=$4 outputpath=$5

module load CBI samtools bowtie2 fastqc r bulker activate databio/pepatac pepatac/pipelines/pepatac.py \ --single-or-paired paired \ --prealignments mouse_chrM2x \ --genome mm10 \ --sample-name $sample \ --input $fastqDir/$read1file \ --input2 $fastqDir/$read2file \ --genome-size mm \ -O $TMPDIR \ -P "$NSLOTS" \ -M 60

mv $TMPDIR/$sample $outputpath

Beginning of log with one core:

Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml Activating bulker crate: databio/pepatac

Pipeline run code and environment:

Version log:

Arguments passed to pipeline:


/wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mouse_chrM2x' the asset 'bowtie2_index.None:default' doesn't exist; tried: bowtie2index/default/mouse$ warnings.warn(msg, RuntimeWarning) /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mm10' the asset 'bowtie2_index.bowtie2_index:default' doesn't exist; tried: bowtie2_index/default/mm10,$ warnings.warn(msg, RuntimeWarning) Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments: Optional assets not existing: blacklist.blacklist:default (--blacklist) Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz

File_mb 1843 2 RES

Read_type paired PEPATAC RES

Genome mm10 PEPATAC RES

Merge/link and fastq conversion: (09-01 11:49:18) elapsed: 0.0 TIME

….

Beginning of log with multiple cores:

Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml Activating bulker crate: databio/pepatac

Pipeline run code and environment:

Version log:

Arguments passed to pipeline:


Command is not callable: pigz /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mouse_chrM2x' the asset 'bowtie2_index.None:default' doesn't exist; tried: bowtie2index/default/mouse$ warnings.warn(msg, RuntimeWarning) /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mm10' the asset 'bowtie2_index.bowtie2_index:default' doesn't exist; tried: bowtie2_index/default/mm10,$ warnings.warn(msg, RuntimeWarning) Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments: Optional assets not existing: blacklist.blacklist:default (--blacklist) Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz

File_mb 1843 2 RES

Read_type paired PEPATAC RES

Genome mm10 PEPATAC RES ...

Thanks for making this! Gabe

On Sep 1, 2020, at 1:35 PM, Nathan Sheffield notifications@github.com wrote:

can you share your sge submission script?

also, can you tell exactly what commands are not available? Maybe if you could just paste the actual run log that would help. if you could do both a successful log (with -p 1) and an unsuccessful one that would be great.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/databio/bulker/issues/64#issuecomment-685117443, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTN3J6Y6HVTEKUKWEPTX73SDVLJHANCNFSM4QRI26BA.

nsheff commented 4 years ago

Ok, so you figured out that when you increase the number of processors, it's requiring pigz, which wasn't installed -- so that explains the error.

But what I can't understand is why it wasn't/isn't using the bulker executables when you activate the bulker crate. It seems like it's not activating the crate, or it's not persisting somehow. I don't quite understand, maybe it's an SGE thing.

Can you try changing these 2 lines:

bulker activate databio/pepatac
pepatac/pipelines/pepatac.py 

to this:

bulker run databio/pepatac pepatac/pipelines/pepatac.py  ... (include all flags as before

And a second thing, if you try the following interactively (without sge submission), what happens?

which pigz
pigz --version
bulker activate databio/pepatac
which pigz
pigz --version

Here is what I see:

$ which pigz
$ pigz --version

Command 'pigz' not found, but can be installed with:

sudo apt install pigz
$ bulker activate databio/pepatac
Bulker config: /home/nsheff/Dropbox/env/bulker_config/zither.yaml
Activating bulker crate: databio/pepatac
$ which pigz
/home/nsheff/bulker_crates/databio/pepatac/default/pigz
$ pigz --version
pigz 2.4
gbloeb commented 4 years ago
[gloeb@dev2 ~]$ which pigz
/usr/bin/which: no pigz in (/wynton/home/reiter/gloeb/cellranger/cellranger-atac-1.2.0:/wynton/home/reiter/gloeb/pepatac_tutorial/tools/pepatac/pipelines:/wynton/home/reiter/gloeb/miniconda3/bin:/wynton/home/reiter/gloeb/miniconda3/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/reiter/gloeb/.local/bin:/wynton/home/reiter/gloeb/bin)
[gloeb@dev2 ~]$ pigz --version
-bash: pigz: command not found
[gloeb@dev2 ~]$ bulker activate databio/pepatac
Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml
Activating bulker crate: databio/pepatac
\[\033[01;93m\]databio/pepatac|\[\033[00m\]\[\033[01;34m\]\w\[\033[00m\]\$
databio/pepatac|~$ which pigz
~/bulker_crates/databio/pepatac/default/pigz
databio/pepatac|~$ pigz --version
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob c64513b74145 done  
Copying blob 01b8b12bad90 done  
Copying blob c5d85cf7a05f done  
Copying blob b6b268720157 done  
Copying blob e12192999ff1 done  
Copying blob 62d2a1087cc4 done  
Copying config 83cb13023b done  
Writing manifest to image destination
Storing signatures
2020/09/02 13:49:06  info unpack layer: sha256:c64513b741452f95d8a147b69c30f403f6289542dd7b2b51dd8ba0cb35d0e08b
2020/09/02 13:49:06  warn rootless{dev/full} creating empty file in place of device 1:7
2020/09/02 13:49:06  warn rootless{dev/null} creating empty file in place of device 1:3
2020/09/02 13:49:06  warn rootless{dev/ptmx} creating empty file in place of device 5:2
2020/09/02 13:49:06  warn rootless{dev/random} creating empty file in place of device 1:8
2020/09/02 13:49:06  warn rootless{dev/tty} creating empty file in place of device 5:0
2020/09/02 13:49:06  warn rootless{dev/urandom} creating empty file in place of device 1:9
2020/09/02 13:49:06  warn rootless{dev/zero} creating empty file in place of device 1:5
2020/09/02 13:49:06  warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/02 13:49:06  warn xattr{/tmp/rootfs-b7003481-ed5d-11ea-bd2f-1418773e516f/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/02 13:49:07  info unpack layer: sha256:01b8b12bad90b51d9f15dd4b63103ea6221b339ac3b3e75807c963e678f28624
2020/09/02 13:49:07  info unpack layer: sha256:c5d85cf7a05fec99bb829db84dc5a21cc0aca569253f45d1ea10ca9e8a03fa9a
2020/09/02 13:49:07  info unpack layer: sha256:b6b268720157210d21bbe49f6112f815774e6d2a6144b14911749fadfdb034f0
2020/09/02 13:49:07  info unpack layer: sha256:e12192999ff18f01315563c63333d7c1059cd8e64dffe75fffe504b95eeb093c
2020/09/02 13:49:07  info unpack layer: sha256:62d2a1087cc44fe6d2ee3e034e04d170b428bef39d182bf97e296a3500ff6368
2020/09/02 13:49:07  warn xattr{var/cache/apt/archives/partial} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/02 13:49:07  warn xattr{/tmp/rootfs-b7003481-ed5d-11ea-bd2f-1418773e516f/var/cache/apt/archives/partial} destination filesystem does not support xattrs, further warnings will be suppressed
INFO:    Creating SIF file...
pigz 2.4
nsheff commented 4 years ago

that is working as it should. can you try submitting the same thing as an SGE job and see what it does?

nsheff commented 4 years ago

@gbloeb, it looks like everything is working correctly when you did it interactively, so the only thing I can think of is that there's something to do with sge submission.

were you able the try the bulker run approach or submitting those jobs to SGE?

nsheff commented 4 years ago

Closing this issue, please re-open if you have time to follow up on this for me. thanks!