Closed gbloeb closed 4 years ago
TLDR: yes, you'd need to use looper run ... --package bulker_sge
to get it to activate the bulker crate for the particular instance. But you'd also need to have a bulker_sge
compute package set up in divvy.
I actually hadn't configured divvy to use SGE yet -- have you? it would be great if you could contribute those templates back to the divcfg repository. But I can show you how it works for SLURM:
Using SLURM directly, we use this template: https://github.com/pepkit/divcfg/blob/master/templates/slurm_template.sub. We use this with looper by specifiying --package slurm
or --package default
, if that's how divvy is configured.
Using SLURM with bulker we use this template: https://github.com/pepkit/divcfg/blob/master/templates/slurm_bulker_template.sub. So we specify with --package bulker_slurm
They're almost identical; the idea is that you need to prepend bulker run
to the beginning of the pipeline call. Looper uses divvy to construct these submission scripts. So if you've already configured divvy with these templates/packages, all you have to do is say looper run pep.yaml --package bulker_slurm
. Since the PEPATAC pipeline interface already knows what crate should be used to compute, this is correctly populated into the SLURM submission script, and everything should just work.
Let me know if that answers your question or if you run into any issues configuring it.
Wait, are you even using looper? If not, then you can still do this... you'd just need to change your SGE script to use bulker activate
or bulker run
for your pepatac job.
I’m not using looper. I do use bulker activate
After more troubleshooting I realized all works well if running PEPATAC on a single core (-p 1), but when I increase to (-p 2) than PEPATAC will start reporting that commands from the bulker crate are no longer available.
This is not dependent on how many cores are requested in the job script (e.g., if I request 4 cores for the job but only run PEPATAC on a single core all works smoothly).
Thanks, Gabe
On Sep 1, 2020, at 6:31 AM, Nathan Sheffield notifications@github.com wrote:
Wait, are you even using looper? If not, then you can still do this... you'd just need to change your SGE script to use bulker activate or bulker run for your pepatac job.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
can you share your sge submission script?
also, can you tell exactly what commands are not available? Maybe if you could just paste the actual run log that would help. if you could do both a successful log (with -p 1) and an unsuccessful one that would be great.
FYI, for me this is no longer an issue, I went ahead and installed everything needed to run the pipeline myself so I’m no longer dependent on bulker. When I was doing these tests, the only program that I had not installed was pigz, so pipeline runs regardless, but as you’ll see from the logs below, when I use more than 1 core, pigz is not available. Just copying the beginning of the logs here which specify whether pigz is callable.
My generic SGE script is:
fastqDir=$1 read1file=$2 read2file=$3 sample=$4 outputpath=$5
module load CBI samtools bowtie2 fastqc r bulker activate databio/pepatac pepatac/pipelines/pepatac.py \ --single-or-paired paired \ --prealignments mouse_chrM2x \ --genome mm10 \ --sample-name $sample \ --input $fastqDir/$read1file \ --input2 $fastqDir/$read2file \ --genome-size mm \ -O $TMPDIR \ -P "$NSLOTS" \ -M 60
mv $TMPDIR/$sample $outputpath
Beginning of log with one core:
Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml Activating bulker crate: databio/pepatac
/wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/pypiper
/wynton/home/reiter/gloeb/pepatac/pipelines
TSS_name
: None
aligner
: bowtie2
anno_name
: None
blacklist
: None
config_file
: pepatac.yaml
cores
: 1
deduplicator
: samblaster
dirty
: False
extend
: 250
force_follow
: False
frip_ref_peaks
: None
genome_assembly
: mm10
genome_size
: mm
input
: ['/wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz']
input2
: ['/wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz']
keep
: False
lite
: False
logdev
: False
mem
: 60
motif
: False
new_start
: False
no_fifo
: False
no_scale
: False
output_parent
: /scratch/179846.1.long.q
paired_end
: True
peak_caller
: macs2
peak_type
: fixed
prealignments
: ['mouse_chrM2x']
prioritize
: False
recover
: False
sample_name
: bulker_again
silent
: False
single_or_paired
: paired
skipqc
: False
sob
: False
testmode
: False
trimmer
: skewer
verbosity
: None
/wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mouse_chrM2x' the asset 'bowtie2_index.None:default' doesn't exist; tried: bowtie2index/default/mouse$ warnings.warn(msg, RuntimeWarning) /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mm10' the asset 'bowtie2_index.bowtie2_index:default' doesn't exist; tried: bowtie2_index/default/mm10,$ warnings.warn(msg, RuntimeWarning) Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments: Optional assets not existing: blacklist.blacklist:default (--blacklist) Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz
File_mb
1843 2 RES
Read_type
paired PEPATAC RES
Genome
mm10 PEPATAC RES
….
Beginning of log with multiple cores:
Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml Activating bulker crate: databio/pepatac
/wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/pypiper
/wynton/home/reiter/gloeb/pepatac/pipelines
TSS_name
: None
aligner
: bowtie2
anno_name
: None
blacklist
: None
config_file
: pepatac.yaml
cores
: 3
deduplicator
: samblaster
dirty
: False
extend
: 250
force_follow
: False
frip_ref_peaks
: None
genome_assembly
: mm10
genome_size
: mm
input
: ['/wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz']
input2
: ['/wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz']
keep
: False
lite
: False
logdev
: False
mem
: 60
motif
: False
new_start
: False
no_fifo
: False
no_scale
: False
output_parent
: /scratch/179898.1.long.q
paired_end
: True
peak_caller
: macs2
peak_type
: fixed
prealignments
: ['mouse_chrM2x']
prioritize
: False
recover
: False
sample_name
: bulker_again
silent
: False
single_or_paired
: paired
skipqc
: False
sob
: False
testmode
: False
trimmer
: skewer
verbosity
: None
Command is not callable: pigz /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mouse_chrM2x' the asset 'bowtie2_index.None:default' doesn't exist; tried: bowtie2index/default/mouse$ warnings.warn(msg, RuntimeWarning) /wynton/home/reiter/gloeb/.local/lib/python3.7/site-packages/refgenconf/refgenconf.py:362: RuntimeWarning: For genome 'mm10' the asset 'bowtie2_index.bowtie2_index:default' doesn't exist; tried: bowtie2_index/default/mm10,$ warnings.warn(msg, RuntimeWarning) Some assets are not found. You can update your REFGENIE config file or point directly to the file using the noted command-line arguments: Optional assets not existing: blacklist.blacklist:default (--blacklist) Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R1_001.fastq.gz Local input file: /wynton/scratch/gabe/ATAC_run1_2/d4_dnase_S1_R2_001.fastq.gz
File_mb
1843 2 RES
Read_type
paired PEPATAC RES
Genome
mm10 PEPATAC RES ...
Thanks for making this! Gabe
On Sep 1, 2020, at 1:35 PM, Nathan Sheffield notifications@github.com wrote:
can you share your sge submission script?
also, can you tell exactly what commands are not available? Maybe if you could just paste the actual run log that would help. if you could do both a successful log (with -p 1) and an unsuccessful one that would be great.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/databio/bulker/issues/64#issuecomment-685117443, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTN3J6Y6HVTEKUKWEPTX73SDVLJHANCNFSM4QRI26BA.
Ok, so you figured out that when you increase the number of processors, it's requiring pigz, which wasn't installed -- so that explains the error.
But what I can't understand is why it wasn't/isn't using the bulker executables when you activate the bulker crate. It seems like it's not activating the crate, or it's not persisting somehow. I don't quite understand, maybe it's an SGE thing.
Can you try changing these 2 lines:
bulker activate databio/pepatac
pepatac/pipelines/pepatac.py
to this:
bulker run databio/pepatac pepatac/pipelines/pepatac.py ... (include all flags as before
And a second thing, if you try the following interactively (without sge submission), what happens?
which pigz
pigz --version
bulker activate databio/pepatac
which pigz
pigz --version
Here is what I see:
$ which pigz
$ pigz --version
Command 'pigz' not found, but can be installed with:
sudo apt install pigz
$ bulker activate databio/pepatac
Bulker config: /home/nsheff/Dropbox/env/bulker_config/zither.yaml
Activating bulker crate: databio/pepatac
$ which pigz
/home/nsheff/bulker_crates/databio/pepatac/default/pigz
$ pigz --version
pigz 2.4
[gloeb@dev2 ~]$ which pigz
/usr/bin/which: no pigz in (/wynton/home/reiter/gloeb/cellranger/cellranger-atac-1.2.0:/wynton/home/reiter/gloeb/pepatac_tutorial/tools/pepatac/pipelines:/wynton/home/reiter/gloeb/miniconda3/bin:/wynton/home/reiter/gloeb/miniconda3/condabin:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/wynton/home/reiter/gloeb/.local/bin:/wynton/home/reiter/gloeb/bin)
[gloeb@dev2 ~]$ pigz --version
-bash: pigz: command not found
[gloeb@dev2 ~]$ bulker activate databio/pepatac
Bulker config: /wynton/home/reiter/gloeb/bulker_crates/bulker_config.yaml
Activating bulker crate: databio/pepatac
\[\033[01;93m\]databio/pepatac|\[\033[00m\]\[\033[01;34m\]\w\[\033[00m\]\$
databio/pepatac|~$ which pigz
~/bulker_crates/databio/pepatac/default/pigz
databio/pepatac|~$ pigz --version
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob c64513b74145 done
Copying blob 01b8b12bad90 done
Copying blob c5d85cf7a05f done
Copying blob b6b268720157 done
Copying blob e12192999ff1 done
Copying blob 62d2a1087cc4 done
Copying config 83cb13023b done
Writing manifest to image destination
Storing signatures
2020/09/02 13:49:06 info unpack layer: sha256:c64513b741452f95d8a147b69c30f403f6289542dd7b2b51dd8ba0cb35d0e08b
2020/09/02 13:49:06 warn rootless{dev/full} creating empty file in place of device 1:7
2020/09/02 13:49:06 warn rootless{dev/null} creating empty file in place of device 1:3
2020/09/02 13:49:06 warn rootless{dev/ptmx} creating empty file in place of device 5:2
2020/09/02 13:49:06 warn rootless{dev/random} creating empty file in place of device 1:8
2020/09/02 13:49:06 warn rootless{dev/tty} creating empty file in place of device 5:0
2020/09/02 13:49:06 warn rootless{dev/urandom} creating empty file in place of device 1:9
2020/09/02 13:49:06 warn rootless{dev/zero} creating empty file in place of device 1:5
2020/09/02 13:49:06 warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/02 13:49:06 warn xattr{/tmp/rootfs-b7003481-ed5d-11ea-bd2f-1418773e516f/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/02 13:49:07 info unpack layer: sha256:01b8b12bad90b51d9f15dd4b63103ea6221b339ac3b3e75807c963e678f28624
2020/09/02 13:49:07 info unpack layer: sha256:c5d85cf7a05fec99bb829db84dc5a21cc0aca569253f45d1ea10ca9e8a03fa9a
2020/09/02 13:49:07 info unpack layer: sha256:b6b268720157210d21bbe49f6112f815774e6d2a6144b14911749fadfdb034f0
2020/09/02 13:49:07 info unpack layer: sha256:e12192999ff18f01315563c63333d7c1059cd8e64dffe75fffe504b95eeb093c
2020/09/02 13:49:07 info unpack layer: sha256:62d2a1087cc44fe6d2ee3e034e04d170b428bef39d182bf97e296a3500ff6368
2020/09/02 13:49:07 warn xattr{var/cache/apt/archives/partial} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/02 13:49:07 warn xattr{/tmp/rootfs-b7003481-ed5d-11ea-bd2f-1418773e516f/var/cache/apt/archives/partial} destination filesystem does not support xattrs, further warnings will be suppressed
INFO: Creating SIF file...
pigz 2.4
that is working as it should. can you try submitting the same thing as an SGE job and see what it does?
@gbloeb, it looks like everything is working correctly when you did it interactively, so the only thing I can think of is that there's something to do with sge submission.
were you able the try the bulker run
approach or submitting those jobs to SGE?
Closing this issue, please re-open if you have time to follow up on this for me. thanks!
I have installed pepatac and used the bulker container which works well on my cluster when using interactively on a development node; however when I submit a job on the SGE pepatac can no longer find the software from the container. This was not a problem when I was running pepatac using a singularity container in an older version.
Are there environmental variable I need to pass to help bulker play well with the SGE? Thanks for any advice!