Open oliclement opened 3 years ago
Yes, sbatch is the submitting command for SLURM. Currently that is the only supported job control system.
hey @afrendeiro is there a reason you didn't use divvy for this?
Not that I remember, although I wasn't sure if divvy support slurm array jobs. I just needed something working for me fast. Want to submit a PR?
@oliclement There would be better ways to make the pipeline use various submission engines, but until then, I've made that in a local config, the user can specify a submission_command
variable that can be an arbitrary command to submit the job.
One could for example just select sh
there have a job run locally in serial for example. That and the dry_run
option "-d" should make it easier to debug any further issues.
I still wasn't able to run tests because I don't have a STAR-indexed genome and couldn't access you files, but maybe in the meanwhile perhaps you can try the latest set of changes on your data?
Hi André,
I am also having the same issue:
Traceback (most recent call last):
File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/bin/scifi", line 11, in
@somnathtagore Are you using SLURM? If not, you could change submission_command
to any other callable in the configuration.
Hi Andre, we are using Univa Grid Engine (SGE) for job submission.
Also, are you talking about the default.yaml file under config ? If yes, i do not see a parameter which can be used to change the submission_command ...
I've updated the README a little to improve documentation: https://github.com/epigen/scifiRNA-seq#configuration-and-logging
The user can either pass a -c
option to specify a configuration for a run, or place this file under ~/.scifi.config.yaml
to be used. Make sure to use the latest version in the main
branch`.
In this file one can specify a submission_command
to be called for job submission just like here: https://github.com/epigen/scifiRNA-seq/blob/main/scifi/config/default.yaml#L34
...
submission_command: sh
...
Alternatively, since the job is written to a bash
file prior to execution, one could run the pipeline in "dry run" mode (option -d
) and run the script a posteriori in a manner adapted to the environment at hand.
Hi Andre,
I changed the submission_command: "qsub" and it still throws the FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' error. So i am trying to use the dry run (-d option). The script generates a 'pipeline_output' directory. Within this directory, i see a 'scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh' file but i don't see the CSV file with summary statistics per barcode; or the CSV file with expression values per cell, per gene; It creates Per well directories, but the mapped and gene tagged BAM files are missing within those.
My question is, whether this is an issue with some incorrect parameters or the issue is with the input files that i am passing.
The fact that the the run still produces the No such file or directory: 'sbatch':
means that either you're running an older version (did you update the code?) or somehow the custom configuration file is not being passed correctly (can you show me the exact command you run?).
The dry run option is designed not to actually run the commands, but to produce all files needed up until the point of running the job. So the fact that the pipeline output is not there makes sense. My suggestion was that by first running the pipeline in dry run mode, the job files would be produced. One could submit a job to whatever job manager a posteriori.
So, i re-installed the pipeline again and i run this:
(base) <0|823>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:
When you say you re-installed, what do you mean exactly? Did you clone the latest version or do git pull on an existing repository?
Could you please show me the content of /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml
and of ~/.scifi.log.txt
? Feel free to email if files are large.
By re-install i mean: i cloned the latest version.
The default.yaml is:
root_output_dir: # default directory for outputs
/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)
expected_cell_number: # default expected number of cells
200000
min_umi_output: # minimum number of UMIs a barcode must have to be reported
3
annotation: # round2 CSV annotation. Superceeded by values in sample CSV annotation.
"/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv"
variables: # variables in round1 CSV annotation to bring along, Superceeded by values in sample CSV annotation - "plate_well"
species_mixing: # whether experiment contains more than one species. Superceeded by value in sample CSV annotatiton
1
array_size: # SLURM job array size
24
chunks:
1000
chunk_batch_size:
25
grna_pbs_sequence:
GTGGAAAGGACGAAACACCG
submission_command:
"qsub"
resoures:
map:
cpus: 4
mem: 60000
queue: "shortq"
time: "08:00:00"
filter:
cpus: 1
mem: 8000
queue: "shortq"
time: "01:00:00"
join:
cpus: 1
mem: 8000
queue: "shortq"
time: "00:30:00"
report:
cpus: 4
mem: 80000
queue: "longq"
time: "3-00:00:00"
The ~/.scifi.log.txt is (i copied the top 50 lines):
8603 scifi.v0.1.dev54+g05fe15b:__init__:L76 (setup_logger) [DEBUG] 2020-12-01 12:56:46 > This is scifi (https://github.com/epigen/scifiRNA-seq), version: 0.1.dev54+g05fe15b
8604 scifi.v0.1.dev54+g05fe15b:__init__:L113 (setup_config) [DEBUG] 2020-12-01 12:56:46 > Reading default configuration file distributed with package from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/config/default.yaml'.
8605 scifi.v0.1.dev54+g05fe15b:__init__:L117 (setup_config) [DEBUG] 2020-12-01 12:56:46 > Default config: {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_S TAR-2.7.0e/', 'gtf_file': '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/scratch/lab_bock/shared/projects/sci-rna/d ata/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/scratch/lab_bock/shared/projects/sci-rna/metadata/sciRNA-seq.PD190_humanmouse.oligos_2019-09-05.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'sbatch', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8606 scifi.v0.1.dev54+g05fe15b:__init__:L151 (setup_config) [DEBUG] 2020-12-01 12:56:46 > To use custom configurations including paths to static files, create a '/ifs/home/c2b2/ac_lab/st3179/.scifi.config.yaml' file.
8607 scifi.v0.1.dev54+g05fe15b:pipeline:L136 (main) [INFO] 2020-12-01 12:59:55 > scifi-RNA-seq pipeline
8608 scifi.v0.1.dev54+g05fe15b:pipeline:L30 (build_cli) [DEBUG] 2020-12-01 12:59:55 > Setting up CLI parser.
8609 scifi.v0.1.dev54+g05fe15b:pipeline:L143 (main) [DEBUG] 2020-12-01 12:59:56 > Namespace(array_size=None, arrayed=False, command='map', config_file='/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml', dry_run=False, inpu t_bam_glob='/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam', root_output_dir='./pipeline_output', sample_annotation='annotation_L001_1.csv', sample_subset=[], toggle=False)
8610 scifi.v0.1.dev54+g05fe15b:__init__:L113 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Reading default configuration file distributed with package from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/config/default.yaml'.
8611 scifi.v0.1.dev54+g05fe15b:__init__:L117 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Default config: {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_S TAR-2.7.0e/', 'gtf_file': '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/scratch/lab_bock/shared/projects/sci-rna/d ata/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/scratch/lab_bock/shared/projects/sci-rna/metadata/sciRNA-seq.PD190_humanmouse.oligos_2019-09-05.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'sbatch', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8612 scifi.v0.1.dev54+g05fe15b:__init__:L151 (setup_config) [DEBUG] 2020-12-01 12:59:56 > To use custom configurations including paths to static files, create a '/ifs/home/c2b2/ac_lab/st3179/.scifi.config.yaml' file.
8613 scifi.v0.1.dev54+g05fe15b:__init__:L158 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Custom passed config: {'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation ': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub ', 'resoures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpu s': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8614 scifi.v0.1.dev54+g05fe15b:__init__:L162 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Updating configuration with custom file from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml'.
8615 scifi.v0.1.dev54+g05fe15b:__init__:L166 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Current config: {'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/i fs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub', 're soures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8616 scifi.v0.1.dev54+g05fe15b:pipeline:L146 (main) [DEBUG] 2020-12-01 12:59:56 > {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_STAR-2.7.0e/', 'gtf_file' : '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expect ed_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_seque nce': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 800 0, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}, 'resoures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': ' shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8617 scifi.v0.1.dev54+g05fe15b:pipeline:L168 (main) [INFO] 2020-12-01 12:59:57 > Samples to submit:
8618 - scifi_L001
8619 - scifi_L002
8620 - scifi_L003
8621 - scifi_L004
8622 scifi.v0.1.dev54+g05fe15b:pipeline:L172 (main) [DEBUG] 2020-12-01 12:59:57 > Doing sample scifi_L001
8623 scifi.v0.1.dev54+g05fe15b:pipeline:L182 (main) [DEBUG] 2020-12-01 12:59:57 > Running map command with sample scifi_L001
8624 scifi.v0.1.dev54+g05fe15b:map:L33 (map_command) [DEBUG] 2020-12-01 12:59:57 > Running map command for sample 'scifi_L001'
8625 scifi.v0.1.dev54+g05fe15b:map:L39 (map_command) [DEBUG] 2020-12-01 12:59:57 > Getting input BAM files for each r1 barcode.
8626 scifi.v0.1.dev54+g05fe15b:map:L42 (map_command) [DEBUG] 2020-12-01 12:59:57 > Attributes to use in input BAM files glob: 'set()'
8627 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:57 > Getting input BAM files for 'scifi_L001_C1'
8628 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C1': './pipeline_output/scifi_L001/scifi_L001_C1/scifi_L001_C1.ALL'
8629 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C1': '{}'
8630 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C1': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8631 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C1': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8632 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C2'
8633 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C2': './pipeline_output/scifi_L001/scifi_L001_C2/scifi_L001_C2.ALL'
8634 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C2': '{}'
8635 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C2': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8636 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C2': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8637 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C3'
8638 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C3': './pipeline_output/scifi_L001/scifi_L001_C3/scifi_L001_C3.ALL'
8639 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C3': '{}'
8640 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C3': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8641 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C3': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8642 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C4'
8643 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C4': './pipeline_output/scifi_L001/scifi_L001_C4/scifi_L001_C4.ALL'
Thanks. Sorry about that. I think I discovered the bug.
Could you please try out a new version from main
with the latest commit?
Also complementary, if you submit the job to qsub
will it run? e.g.
qsub \
-q <job_queue> \
-N scifi_pipeline.scifi_L002.map.scifi_L001_C96 \
-l h_vmem=60G \
-l h_rt=08:00:00 \
-pe smp 4 \
-o <path_to>/scifi_pipeline.scifi_L002.map.scifi_L001_C96.log \
<path_to>/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh
You'd need to adapt the command to your environment.
ok.. i did that and now i get this: (base) <0|843>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:
Ahhh I'm so sorry, I corrected it wrongly. I do apologize but right now I can't test this. Please try again now with the latest commit.
ok... so i re-did that and now i get this again: (base) <0|853>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq$ pip install -e . Obtaining file:///ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq Requirement already satisfied: numpy>=1.14.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (1.15.4) Requirement already satisfied: scipy>=1.0.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (1.1.0) Requirement already satisfied: pandas>=0.22.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.24.1) Requirement already satisfied: pysam>=0.13 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.15.1) Requirement already satisfied: matplotlib>=2.1.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (3.0.2) Requirement already satisfied: seaborn>=0.8.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.9.0) Requirement already satisfied: pyyaml in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (5.3.1) Requirement already satisfied: anndata in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.6.18) Requirement already satisfied: joblib in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.13.1) Requirement already satisfied: python-dateutil>=2.5.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (2.7.5) Requirement already satisfied: pytz>=2011k in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (2018.7) Requirement already satisfied: cycler>=0.10 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (1.0.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (2.3.0) Requirement already satisfied: h5py in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from anndata->scifi==0.1.dev57+gcbf082b) (2.9.0) Requirement already satisfied: natsort in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from anndata->scifi==0.1.dev57+gcbf082b) (6.0.0) Requirement already satisfied: six>=1.5 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (1.12.0) Requirement already satisfied: setuptools in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (40.6.3) Installing collected packages: scifi Found existing installation: scifi 0.1.dev57+gcbf082b Uninstalling scifi-0.1.dev57+gcbf082b: Successfully uninstalled scifi-0.1.dev57+gcbf082b Running setup.py develop for scifi Successfully installed scifi (base) <0|854>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq$ cd ../picard/picard (base) <0|855>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ cp default.yaml ../../scifiRNA-seq/scifi/config/ (base) <0|856>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:
Hi Andre, i am facing the FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' error
I changed the submission_command: "qsub" in the default.yaml file.
Sorry, didn't have a lot of time. The current main
branch now runs in a local machine with a custom config using sh -e
as the submission command.
Hi Andre,
After running it again, i get this error:
Dec 03 14:18:56 ...... FATAL ERROR, exiting Thu Dec 3 14:18:56 EST 2020
EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "readFilesType" in input "Command-Line-Initial" SOLUTION: use correct parameter name (check the manual)
Hmm not sure what that means but it is an error from STAR. Are you using version 2.7.0e?
i was trying with an already installed version of STAR: 2.5.3a .. Do you think this pipeline requires 2.7.0e ?
I think it is quite possible to work with newer versions, but if I remember correctly I needed something that was implemented new so an older version is unlikely to work. In your case that version would really not work because the parameter "readFilesType" does not exist. Please note that STAR requires a genome index specific to a given version.
Hi Andre, so now i installed the latest version of STAR and also created the genomic index.... now i get this error:
~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv
Dec 05 23:06:22 ...... FATAL ERROR, exiting Sat Dec 5 23:06:22 EST 2020 Dec 05 23:05:58 ..... started STAR run Dec 05 23:06:23 ..... loading genome Dec 05 23:06:00 ..... started STAR run Dec 05 23:06:23 ..... loading genome
EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc Possible cause 1: not enough RAM. Check if you have enough RAM 31717214595 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31717214595
The only place i see where memory could be assigned is the default.yaml file:
resoures: map: cpus: 4 mem: 600000 queue: "shortq" time: "08:00:00" filter: cpus: 1 mem: 800000 queue: "shortq" time: "01:00:00" join: cpus: 1 mem: 800000 queue: "shortq" time: "00:30:00" report: cpus: 4 mem: 800000 queue: "longq" time: "3-00:00:00"
If you see above, I have already allotted a large memory in the map command... can you let me know if there is any other way we could assign a higher memory to the script ?
Hi Andre,
I tried increasing the memory and i believe the script ran: (base) <0|1003>st3179@ha4c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L141 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L172 (main) [INFO] > Samples to submit:
When i scan the folders within pipeline_output: (base) <0|1006>st3179@login1:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ls -lrt pipeline_output/scifi_L001/scifi_L001_C43 ls: unparsable value for LS_COLORS environment variable total 32 lrwxrwxrwx. 1 st3179 ac_lab 127 Dec 3 09:32 scifi_L001_C43.ALL.STAR.Aligned.out.exon.bam -> /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C43/scifi_L001_C43.ALL.STAR.Aligned.out.bam -rw-r--r--. 1 st3179 ac_lab 0 Dec 5 22:48 scifi_L001_C43.ALL.STAR.Log.progress.out -rw-r--r--. 1 st3179 ac_lab 0 Dec 5 22:48 scifi_L001_C43.ALL.STAR.Aligned.out.bam drwx------. 2 st3179 ac_lab 0 Dec 7 11:16 scifi_L001_C43.ALL.STAR._STARtmp -rw-r--r--. 1 st3179 ac_lab 3820 Dec 7 11:16 scifi_L001_C43.ALL.STAR.Log.out
I dont understand why the bam file is of 0 size and why the following files do not get created.
Per well, mapped and gene tagged BAM file; CSV file with summary statistics per barcode; CSV file with expression values per cell, per gene; h5ad gene expression file;
Please advice what might be the issue.
Thanks!
For how long did the job run? Did you check the logs of the job?
It seems two files were created two days ago while two others only today. Maybe you had already some output files created and STAR did not want to overwrite them? Just to be sure, remove the whole output directory prior to re-running.
Also, I would recommend using STAR version 2.7.0e which is the tested version.
Hi Andre,
I am using STAR version 2.7.6a. I removed the output directory and re-ran the script.
(base) <0|1027>st3179@login1:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L141 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L172 (main) [INFO] > Samples to submit:
EXITING because of fatal ERROR: could not make temporary directory: /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/scifi_L001_C50.ALL.STAR._STARtmp/ SOLUTION: (i) please check the path and writing permissions (ii) if you specified --outTmpDir, and this directory exists - please remove it before running STAR
Dec 07 12:19:37 ...... FATAL ERROR, exiting Mon Dec 7 12:19:37 EST 2020 Mon Dec 7 12:19:37 EST 2020 Mon Dec 7 12:19:38 EST 2020
EXITING because of fatal ERROR: could not make temporary directory: /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C54/scifi_L001_C54.ALL.STAR._STARtmp/ SOLUTION: (i) please check the path and writing permissions (ii) if you specified --outTmpDir, and this directory exists - please remove it before running STAR
Dec 07 12:19:39 ...... FATAL ERROR, exiting Mon Dec 7 12:19:39 EST 2020 Mon Dec 7 12:19:39 EST 2020 Mon Dec 7 12:19:40 EST 2020 Dec 07 12:19:37 ..... started STAR run Dec 07 12:19:28 ..... started STAR run Dec 07 12:19:28 ..... started STAR run Dec 07 12:20:53 ..... loading genome Dec 07 12:20:53 ..... loading genome Dec 07 12:20:53 ..... loading genome Dec 07 12:20:12 ..... started STAR run Dec 07 12:21:06 ..... loading genome Mon Dec 7 12:21:10 EST 2020 Mon Dec 7 12:21:12 EST 2020 Dec 07 12:20:13 ..... started STAR run Dec 07 12:21:15 ..... loading genome Dec 07 12:20:17 ..... started STAR run Dec 07 12:21:17 ..... loading genome Mon Dec 7 12:21:17 EST 2020 Mon Dec 7 12:21:19 EST 2020 Dec 07 12:20:19 ..... started STAR run Dec 07 12:21:20 ..... loading genome
EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc Possible cause 1: not enough RAM. Check if you have enough RAM 31717214595 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31717214595
Dec 07 12:21:22 ...... FATAL ERROR, exiting
What do you advice ? Thanks!
So there are a couple things there.
1) it seems the directory "/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/" could not be created. It seems strange to me that pipeline_output
is under a picard
directory, but if you really want that, you could try removing everything and create just that directory, prior to running:
rm -r /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/
mkdir -p /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/
2) Lack of memory: this sounds more plausible to me. Are you giving enough memory to your job? Since you're not using SLURM you'd need to specify this to your job manager.
ok. Now i gave 100G memory to the job. It seems now, the log file is read-only ! As far as i know i have all permissions set up in this directory.
Job started Mon Dec 7 13:29:12 EST 2020
running on hb4c3n5.hpc
Traceback (most recent call last):
File "/ifs/home/c2b2/ac_lab/st3179/bin/scifi", line 11, in
Ok 100Gb is way too much, but it seems the current problem is yet again unrelated. The general logfile for the executable is in the home directory of the user. Are you user "st3179"? Why can't you write to that directory? I've made a commit that tries to get around this by if the home directory is not writable it will try to use the current directory.
ok. Understood. One issue. When i change submission_command: "sbatch" from default.yaml, since out cluster supports qsub:
should i the submission_command: "qsub" ? or submission_command: "sh -e" ?
i am using qsub for submitting the job.
You can pass any arbitrary command with options. The pipeline will the run cmd <job_file.sh>
.
E.g. sh -e <job_file.sh>
or sbatch -p partitionname -c 4 <job_file.sh>
if cmd is "sbatch -p partitionname -c 4".
In your case you could for example pass "qsub -pe smp 4 -l h_vmem=60G
" for running with 4 cpus and 60Gb memory.
I've actually never used SGE, I just saw what some options are here: http://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/how_to_submit_a_job_using_qsub.html
Again you could bypass all this by just submitting the sh
file as a job to SGE manually.
Hi Andre,
I have to use qsub, otherwise i cannot allot memory to the jobs. When i do that: the jobs go into error mode,
15289521 0.01690 scifi_pipe st3179 Eqw 12/08/2020 00:04:15 1
15289525 0.01599 scifi_pipe st3179 Eqw 12/08/2020 00:04:15 1
(base) <0|1022>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ qstat -j 15289521 | grep error error reason 1: 12/08/2020 00:04:16 [3224:58494]: error: can't open stdout output file "/ifs/home/c2b2/ac_lab/st3179/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289521": Read-only file system (base) <0|1023>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ qstat -j 15289525 | grep error error reason 1: 12/08/2020 00:04:16 [3224:24771]: error: can't open stdout output file "/ifs/home/c2b2/ac_lab/st3179/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289525": Read-only file system (base) <0|1024>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$
Is there any way to create these files (scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289521 etc.) in the current directory, rather than the home directory, as you did for .scifi.log.txt ?
I'm not sure I can help you, I don't know what those files are.
You can control where the output of the run is using the --output-dir
option. Do scifi map --help
to see all options.
Hi Andre, thanks! I am still trying to figure out the issue. By any chance, do you have a toy example dataset, that i can use to test whether the pipeline is running properly or not. In that case, it would be easy to figure out whether there is some issue with the dataset formatting steps. thanks!
Hi André, The pipeline is running now. Thanks!
Hi André,
following the change I mentioned in the previous issue (#3 in submit_job cmd = """{cmd} -J {job_name} \ KeyError: 'job_name'), I ran into another error:
So it looks like the sbatch file/command is missing. Or I am wrong? Thanks a lot