epigen / scifiRNA-seq

GNU General Public License v3.0
11 stars 5 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'sbatch' #4

Open oliclement opened 3 years ago

oliclement commented 3 years ago

Hi André,

following the change I mentioned in the previous issue (#3 in submit_job cmd = """{cmd} -J {job_name} \ KeyError: 'job_name'), I ran into another error:

 File "/mnt/remoteserv/switch/userdata/usrdat03/userdata/oclement/Toolbin/Stiletto/miniconda3/envs/scifi/bin/scifiRNA-seq/build/lib/scifi/job_control.py", line 44, in submit_job
    subprocess.Popen(cmd.split(" "))
  File "/home/oclement/working_data_03/Toolbin/Stiletto/miniconda3/envs/scifi/lib/python3.9/subprocess.py", line 947, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/oclement/working_data_03/Toolbin/Stiletto/miniconda3/envs/scifi/lib/python3.9/subprocess.py", line 1819, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'sbatch'

So it looks like the sbatch file/command is missing. Or I am wrong? Thanks a lot

afrendeiro commented 3 years ago

Yes, sbatch is the submitting command for SLURM. Currently that is the only supported job control system.

nsheff commented 3 years ago

hey @afrendeiro is there a reason you didn't use divvy for this?

afrendeiro commented 3 years ago

Not that I remember, although I wasn't sure if divvy support slurm array jobs. I just needed something working for me fast. Want to submit a PR?

afrendeiro commented 3 years ago

@oliclement There would be better ways to make the pipeline use various submission engines, but until then, I've made that in a local config, the user can specify a submission_command variable that can be an arbitrary command to submit the job. One could for example just select sh there have a job run locally in serial for example. That and the dry_run option "-d" should make it easier to debug any further issues.

I still wasn't able to run tests because I don't have a STAR-indexed genome and couldn't access you files, but maybe in the meanwhile perhaps you can try the latest set of changes on your data?

somnathtagore commented 3 years ago

Hi André, I am also having the same issue: Traceback (most recent call last): File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/bin/scifi", line 11, in load_entry_point('scifi', 'console_scripts', 'scifi')() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 487, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2728, in load_entry_point return ep.load() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2346, in load return self.resolve() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2352, in resolve module = import(self.module_name, fromlist=['name'], level=0) File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/init.py", line 171, in sys.exit(main()) File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/pipeline.py", line 152, in main map_command(args, sample_name, sample_out_dir, r1_annotation, _CONFIG) File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/map.py", line 85, in map_command submit_job(job, params) File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/job_control.py", line 44, in submit_job subprocess.Popen(cmd.split(" ")) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/subprocess.py", line 775, in init restore_signals, start_new_session) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/subprocess.py", line 1522, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch'

afrendeiro commented 3 years ago

@somnathtagore Are you using SLURM? If not, you could change submission_command to any other callable in the configuration.

somnathtagore commented 3 years ago

Hi Andre, we are using Univa Grid Engine (SGE) for job submission.

somnathtagore commented 3 years ago

Also, are you talking about the default.yaml file under config ? If yes, i do not see a parameter which can be used to change the submission_command ...

afrendeiro commented 3 years ago

I've updated the README a little to improve documentation: https://github.com/epigen/scifiRNA-seq#configuration-and-logging

The user can either pass a -c option to specify a configuration for a run, or place this file under ~/.scifi.config.yaml to be used. Make sure to use the latest version in the main branch`.

In this file one can specify a submission_command to be called for job submission just like here: https://github.com/epigen/scifiRNA-seq/blob/main/scifi/config/default.yaml#L34

...
submission_command: sh
...

Alternatively, since the job is written to a bash file prior to execution, one could run the pipeline in "dry run" mode (option -d) and run the script a posteriori in a manner adapted to the environment at hand.

somnathtagore commented 3 years ago

Hi Andre,

I changed the submission_command: "qsub" and it still throws the FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' error. So i am trying to use the dry run (-d option). The script generates a 'pipeline_output' directory. Within this directory, i see a 'scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh' file but i don't see the CSV file with summary statistics per barcode; or the CSV file with expression values per cell, per gene; It creates Per well directories, but the mapped and gene tagged BAM files are missing within those.

My question is, whether this is an issue with some incorrect parameters or the issue is with the input files that i am passing.

afrendeiro commented 3 years ago

The fact that the the run still produces the No such file or directory: 'sbatch': means that either you're running an older version (did you update the code?) or somehow the custom configuration file is not being passed correctly (can you show me the exact command you run?).

The dry run option is designed not to actually run the commands, but to produce all files needed up until the point of running the job. So the fact that the pipeline output is not there makes sense. My suggestion was that by first running the pipeline in dry run mode, the job files would be produced. One could submit a job to whatever job manager a posteriori.

somnathtagore commented 3 years ago

So, i re-installed the pipeline again and i run this:

(base) <0|823>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:

afrendeiro commented 3 years ago

When you say you re-installed, what do you mean exactly? Did you clone the latest version or do git pull on an existing repository?

Could you please show me the content of /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml and of ~/.scifi.log.txt? Feel free to email if files are large.

somnathtagore commented 3 years ago

By re-install i mean: i cloned the latest version.

The default.yaml is:

root_output_dir:  # default directory for outputs
  /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)
expected_cell_number:  # default expected number of cells
  200000
min_umi_output:  # minimum number of UMIs a barcode must have to be reported
  3
annotation:  # round2 CSV annotation. Superceeded by values in sample CSV annotation.
  "/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv"
variables:  # variables in round1 CSV annotation to bring along, Superceeded by values in sample CSV annotation - "plate_well"
species_mixing:  # whether experiment contains more than one species. Superceeded by value in sample CSV annotatiton
  1
array_size:  # SLURM job array size
  24
chunks:
  1000
chunk_batch_size:
  25
grna_pbs_sequence:
  GTGGAAAGGACGAAACACCG
submission_command:
  "qsub"

resoures:
  map:
    cpus: 4
    mem: 60000
    queue: "shortq"
    time: "08:00:00"
  filter:
    cpus: 1
    mem: 8000
    queue: "shortq"
    time: "01:00:00"
  join:
    cpus: 1
    mem: 8000
    queue: "shortq"
    time: "00:30:00"
  report:
    cpus: 4
    mem: 80000
    queue: "longq"
    time: "3-00:00:00"

The ~/.scifi.log.txt is (i copied the top 50 lines):

8603 scifi.v0.1.dev54+g05fe15b:__init__:L76 (setup_logger) [DEBUG] 2020-12-01 12:56:46 > This is scifi (https://github.com/epigen/scifiRNA-seq), version: 0.1.dev54+g05fe15b
8604 scifi.v0.1.dev54+g05fe15b:__init__:L113 (setup_config) [DEBUG] 2020-12-01 12:56:46 > Reading default configuration file distributed with package from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/config/default.yaml'.
8605 scifi.v0.1.dev54+g05fe15b:__init__:L117 (setup_config) [DEBUG] 2020-12-01 12:56:46 > Default config: {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_S     TAR-2.7.0e/', 'gtf_file': '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/scratch/lab_bock/shared/projects/sci-rna/d     ata/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/scratch/lab_bock/shared/projects/sci-rna/metadata/sciRNA-seq.PD190_humanmouse.oligos_2019-09-05.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size':      24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'sbatch', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue':      'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8606 scifi.v0.1.dev54+g05fe15b:__init__:L151 (setup_config) [DEBUG] 2020-12-01 12:56:46 > To use custom configurations including paths to static files, create a '/ifs/home/c2b2/ac_lab/st3179/.scifi.config.yaml' file.
8607 scifi.v0.1.dev54+g05fe15b:pipeline:L136 (main) [INFO] 2020-12-01 12:59:55 > scifi-RNA-seq pipeline
8608 scifi.v0.1.dev54+g05fe15b:pipeline:L30 (build_cli) [DEBUG] 2020-12-01 12:59:55 > Setting up CLI parser.
8609 scifi.v0.1.dev54+g05fe15b:pipeline:L143 (main) [DEBUG] 2020-12-01 12:59:56 > Namespace(array_size=None, arrayed=False, command='map', config_file='/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml', dry_run=False, inpu     t_bam_glob='/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam', root_output_dir='./pipeline_output', sample_annotation='annotation_L001_1.csv', sample_subset=[], toggle=False)
8610 scifi.v0.1.dev54+g05fe15b:__init__:L113 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Reading default configuration file distributed with package from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/build/lib/scifi/config/default.yaml'.
8611 scifi.v0.1.dev54+g05fe15b:__init__:L117 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Default config: {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_S     TAR-2.7.0e/', 'gtf_file': '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/scratch/lab_bock/shared/projects/sci-rna/d     ata/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/scratch/lab_bock/shared/projects/sci-rna/metadata/sciRNA-seq.PD190_humanmouse.oligos_2019-09-05.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size':      24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'sbatch', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue':      'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8612 scifi.v0.1.dev54+g05fe15b:__init__:L151 (setup_config) [DEBUG] 2020-12-01 12:59:56 > To use custom configurations including paths to static files, create a '/ifs/home/c2b2/ac_lab/st3179/.scifi.config.yaml' file.
8613 scifi.v0.1.dev54+g05fe15b:__init__:L158 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Custom passed config: {'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation     ': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub     ', 'resoures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpu     s': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8614 scifi.v0.1.dev54+g05fe15b:__init__:L162 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Updating configuration with custom file from '/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifiRNA-seq/scifi/config/default.yaml'.
8615 scifi.v0.1.dev54+g05fe15b:__init__:L166 (setup_config) [DEBUG] 2020-12-01 12:59:56 > Current config: {'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expected_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/i     fs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_sequence': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub', 're     soures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4,      'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8616 scifi.v0.1.dev54+g05fe15b:pipeline:L146 (main) [DEBUG] 2020-12-01 12:59:56 > {'star_exe': '/home/arendeiro/workspace/STAR-2.7.0e/bin/Linux_x86_64_static/STAR', 'star_genome_dir': '/home/arendeiro/resources/genomes/hg38/indexed_STAR-2.7.0e/', 'gtf_file'     : '/home/arendeiro/resources/genomes/hg38/10X/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf', 'featurecounts_exe': 'subread-2.0.1-Linux-x86_64/bin/featureCounts', 'root_output_dir': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/$(RUN_NAME)', 'expect     ed_cell_number': 200000, 'min_umi_output': 3, 'annotation': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/annotation_L001.csv', 'variables': ['plate_well'], 'species_mixing': 1, 'array_size': 24, 'chunks': 1000, 'chunk_batch_size': 25, 'grna_pbs_seque     nce': 'GTGGAAAGGACGAAACACCG', 'submission_command': 'qsub', 'resources': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 800     0, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}, 'resoures': {'map': {'cpus': 4, 'mem': 60000, 'queue': 'shortq', 'time': '08:00:00'}, 'filter': {'cpus': 1, 'mem': 8000, 'queue': '     shortq', 'time': '01:00:00'}, 'join': {'cpus': 1, 'mem': 8000, 'queue': 'shortq', 'time': '00:30:00'}, 'report': {'cpus': 4, 'mem': 80000, 'queue': 'longq', 'time': '3-00:00:00'}}}
8617 scifi.v0.1.dev54+g05fe15b:pipeline:L168 (main) [INFO] 2020-12-01 12:59:57 > Samples to submit:
8618          - scifi_L001
8619          - scifi_L002
8620          - scifi_L003
8621          - scifi_L004
8622 scifi.v0.1.dev54+g05fe15b:pipeline:L172 (main) [DEBUG] 2020-12-01 12:59:57 > Doing sample scifi_L001
8623 scifi.v0.1.dev54+g05fe15b:pipeline:L182 (main) [DEBUG] 2020-12-01 12:59:57 > Running map command with sample scifi_L001
8624 scifi.v0.1.dev54+g05fe15b:map:L33 (map_command) [DEBUG] 2020-12-01 12:59:57 > Running map command for sample 'scifi_L001'
8625 scifi.v0.1.dev54+g05fe15b:map:L39 (map_command) [DEBUG] 2020-12-01 12:59:57 > Getting input BAM files for each r1 barcode.
8626 scifi.v0.1.dev54+g05fe15b:map:L42 (map_command) [DEBUG] 2020-12-01 12:59:57 > Attributes to use in input BAM files glob: 'set()'
8627 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:57 > Getting input BAM files for 'scifi_L001_C1'
8628 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C1': './pipeline_output/scifi_L001/scifi_L001_C1/scifi_L001_C1.ALL'
8629 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C1': '{}'
8630 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C1': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8631 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C1': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8632 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C2'
8633 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C2': './pipeline_output/scifi_L001/scifi_L001_C2/scifi_L001_C2.ALL'
8634 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C2': '{}'
8635 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C2': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8636 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C2': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8637 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C3'
8638 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C3': './pipeline_output/scifi_L001/scifi_L001_C3/scifi_L001_C3.ALL'
8639 scifi.v0.1.dev54+g05fe15b:map:L54 (map_command) [DEBUG] 2020-12-01 12:59:58 > Formatting variables for sample 'scifi_L001_C3': '{}'
8640 scifi.v0.1.dev54+g05fe15b:map:L58 (map_command) [DEBUG] 2020-12-01 12:59:58 > Glob for BAM files for sample 'scifi_L001_C3': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8641 scifi.v0.1.dev54+g05fe15b:map:L61 (map_command) [DEBUG] 2020-12-01 12:59:58 > BAM files of sample 'scifi_L001_C3': '/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam'
8642 scifi.v0.1.dev54+g05fe15b:map:L44 (map_command) [DEBUG] 2020-12-01 12:59:58 > Getting input BAM files for 'scifi_L001_C4'
8643 scifi.v0.1.dev54+g05fe15b:map:L49 (map_command) [DEBUG] 2020-12-01 12:59:58 > Prefix for sample 'scifi_L001_C4': './pipeline_output/scifi_L001/scifi_L001_C4/scifi_L001_C4.ALL'
afrendeiro commented 3 years ago

Thanks. Sorry about that. I think I discovered the bug. Could you please try out a new version from main with the latest commit?

Also complementary, if you submit the job to qsub will it run? e.g.

qsub \
    -q <job_queue> \
    -N scifi_pipeline.scifi_L002.map.scifi_L001_C96 \
    -l h_vmem=60G \
    -l h_rt=08:00:00 \
    -pe smp 4 \
    -o <path_to>/scifi_pipeline.scifi_L002.map.scifi_L001_C96.log \
    <path_to>/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh

You'd need to adapt the command to your environment.

somnathtagore commented 3 years ago

ok.. i did that and now i get this: (base) <0|843>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:

afrendeiro commented 3 years ago

Ahhh I'm so sorry, I corrected it wrongly. I do apologize but right now I can't test this. Please try again now with the latest commit.

somnathtagore commented 3 years ago

ok... so i re-did that and now i get this again: (base) <0|853>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq$ pip install -e . Obtaining file:///ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq Requirement already satisfied: numpy>=1.14.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (1.15.4) Requirement already satisfied: scipy>=1.0.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (1.1.0) Requirement already satisfied: pandas>=0.22.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.24.1) Requirement already satisfied: pysam>=0.13 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.15.1) Requirement already satisfied: matplotlib>=2.1.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (3.0.2) Requirement already satisfied: seaborn>=0.8.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.9.0) Requirement already satisfied: pyyaml in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (5.3.1) Requirement already satisfied: anndata in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.6.18) Requirement already satisfied: joblib in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from scifi==0.1.dev57+gcbf082b) (0.13.1) Requirement already satisfied: python-dateutil>=2.5.0 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (2.7.5) Requirement already satisfied: pytz>=2011k in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (2018.7) Requirement already satisfied: cycler>=0.10 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (1.0.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (2.3.0) Requirement already satisfied: h5py in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from anndata->scifi==0.1.dev57+gcbf082b) (2.9.0) Requirement already satisfied: natsort in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from anndata->scifi==0.1.dev57+gcbf082b) (6.0.0) Requirement already satisfied: six>=1.5 in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from python-dateutil>=2.5.0->pandas>=0.22.0->scifi==0.1.dev57+gcbf082b) (1.12.0) Requirement already satisfied: setuptools in /ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.1.1->scifi==0.1.dev57+gcbf082b) (40.6.3) Installing collected packages: scifi Found existing installation: scifi 0.1.dev57+gcbf082b Uninstalling scifi-0.1.dev57+gcbf082b: Successfully uninstalled scifi-0.1.dev57+gcbf082b Running setup.py develop for scifi Successfully installed scifi (base) <0|854>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq$ cd ../picard/picard (base) <0|855>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ cp default.yaml ../../scifiRNA-seq/scifi/config/ (base) <0|856>st3179@login4:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L136 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L168 (main) [INFO] > Samples to submit:

somnathtagore commented 3 years ago

Hi Andre, i am facing the FileNotFoundError: [Errno 2] No such file or directory: 'sbatch': 'sbatch' error

I changed the submission_command: "qsub" in the default.yaml file.

afrendeiro commented 3 years ago

Sorry, didn't have a lot of time. The current main branch now runs in a local machine with a custom config using sh -e as the submission command.

somnathtagore commented 3 years ago

Hi Andre,

After running it again, i get this error:

Dec 03 14:18:56 ...... FATAL ERROR, exiting Thu Dec 3 14:18:56 EST 2020

EXITING: FATAL INPUT ERROR: unrecoginzed parameter name "readFilesType" in input "Command-Line-Initial" SOLUTION: use correct parameter name (check the manual)

afrendeiro commented 3 years ago

Hmm not sure what that means but it is an error from STAR. Are you using version 2.7.0e?

somnathtagore commented 3 years ago

i was trying with an already installed version of STAR: 2.5.3a .. Do you think this pipeline requires 2.7.0e ?

afrendeiro commented 3 years ago

I think it is quite possible to work with newer versions, but if I remember correctly I needed something that was implemented new so an older version is unlikely to work. In your case that version would really not work because the parameter "readFilesType" does not exist. Please note that STAR requires a genome index specific to a given version.

somnathtagore commented 3 years ago

Hi Andre, so now i installed the latest version of STAR and also created the genomic index.... now i get this error:

~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv

Dec 05 23:06:22 ...... FATAL ERROR, exiting Sat Dec 5 23:06:22 EST 2020 Dec 05 23:05:58 ..... started STAR run Dec 05 23:06:23 ..... loading genome Dec 05 23:06:00 ..... started STAR run Dec 05 23:06:23 ..... loading genome

EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc Possible cause 1: not enough RAM. Check if you have enough RAM 31717214595 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31717214595

The only place i see where memory could be assigned is the default.yaml file:

resoures: map: cpus: 4 mem: 600000 queue: "shortq" time: "08:00:00" filter: cpus: 1 mem: 800000 queue: "shortq" time: "01:00:00" join: cpus: 1 mem: 800000 queue: "shortq" time: "00:30:00" report: cpus: 4 mem: 800000 queue: "longq" time: "3-00:00:00"

If you see above, I have already allotted a large memory in the map command... can you let me know if there is any other way we could assign a higher memory to the script ?

somnathtagore commented 3 years ago

Hi Andre,

I tried increasing the memory and i believe the script ran: (base) <0|1003>st3179@ha4c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L141 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L172 (main) [INFO] > Samples to submit:

When i scan the folders within pipeline_output: (base) <0|1006>st3179@login1:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ls -lrt pipeline_output/scifi_L001/scifi_L001_C43 ls: unparsable value for LS_COLORS environment variable total 32 lrwxrwxrwx. 1 st3179 ac_lab 127 Dec 3 09:32 scifi_L001_C43.ALL.STAR.Aligned.out.exon.bam -> /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C43/scifi_L001_C43.ALL.STAR.Aligned.out.bam -rw-r--r--. 1 st3179 ac_lab 0 Dec 5 22:48 scifi_L001_C43.ALL.STAR.Log.progress.out -rw-r--r--. 1 st3179 ac_lab 0 Dec 5 22:48 scifi_L001_C43.ALL.STAR.Aligned.out.bam drwx------. 2 st3179 ac_lab 0 Dec 7 11:16 scifi_L001_C43.ALL.STAR._STARtmp -rw-r--r--. 1 st3179 ac_lab 3820 Dec 7 11:16 scifi_L001_C43.ALL.STAR.Log.out

I dont understand why the bam file is of 0 size and why the following files do not get created.

Per well, mapped and gene tagged BAM file; CSV file with summary statistics per barcode; CSV file with expression values per cell, per gene; h5ad gene expression file;

Please advice what might be the issue.

Thanks!

afrendeiro commented 3 years ago

For how long did the job run? Did you check the logs of the job?

It seems two files were created two days ago while two others only today. Maybe you had already some output files created and STAR did not want to overwrite them? Just to be sure, remove the whole output directory prior to re-running.

Also, I would recommend using STAR version 2.7.0e which is the tested version.

somnathtagore commented 3 years ago

Hi Andre,

I am using STAR version 2.7.6a. I removed the output directory and re-ran the script.

(base) <0|1027>st3179@login1:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ ~/bin/scifi map -c /ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/scifi/config/default.yaml --input-bam-glob /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/unaligned_revertsam_S0_L001_R1_R2.bam annotation_L001_1.csv scifi:pipeline:L141 (main) [INFO] > scifi-RNA-seq pipeline scifi:pipeline:L172 (main) [INFO] > Samples to submit:

EXITING because of fatal ERROR: could not make temporary directory: /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/scifi_L001_C50.ALL.STAR._STARtmp/ SOLUTION: (i) please check the path and writing permissions (ii) if you specified --outTmpDir, and this directory exists - please remove it before running STAR

Dec 07 12:19:37 ...... FATAL ERROR, exiting Mon Dec 7 12:19:37 EST 2020 Mon Dec 7 12:19:37 EST 2020 Mon Dec 7 12:19:38 EST 2020

EXITING because of fatal ERROR: could not make temporary directory: /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C54/scifi_L001_C54.ALL.STAR._STARtmp/ SOLUTION: (i) please check the path and writing permissions (ii) if you specified --outTmpDir, and this directory exists - please remove it before running STAR

Dec 07 12:19:39 ...... FATAL ERROR, exiting Mon Dec 7 12:19:39 EST 2020 Mon Dec 7 12:19:39 EST 2020 Mon Dec 7 12:19:40 EST 2020 Dec 07 12:19:37 ..... started STAR run Dec 07 12:19:28 ..... started STAR run Dec 07 12:19:28 ..... started STAR run Dec 07 12:20:53 ..... loading genome Dec 07 12:20:53 ..... loading genome Dec 07 12:20:53 ..... loading genome Dec 07 12:20:12 ..... started STAR run Dec 07 12:21:06 ..... loading genome Mon Dec 7 12:21:10 EST 2020 Mon Dec 7 12:21:12 EST 2020 Dec 07 12:20:13 ..... started STAR run Dec 07 12:21:15 ..... loading genome Dec 07 12:20:17 ..... started STAR run Dec 07 12:21:17 ..... loading genome Mon Dec 7 12:21:17 EST 2020 Mon Dec 7 12:21:19 EST 2020 Dec 07 12:20:19 ..... started STAR run Dec 07 12:21:20 ..... loading genome

EXITING: fatal error trying to allocate genome arrays, exception thrown: std::bad_alloc Possible cause 1: not enough RAM. Check if you have enough RAM 31717214595 bytes Possible cause 2: not enough virtual memory allowed with ulimit. SOLUTION: run ulimit -v 31717214595

Dec 07 12:21:22 ...... FATAL ERROR, exiting

What do you advice ? Thanks!

afrendeiro commented 3 years ago

So there are a couple things there. 1) it seems the directory "/ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/" could not be created. It seems strange to me that pipeline_output is under a picard directory, but if you really want that, you could try removing everything and create just that directory, prior to running:

rm -r /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/
mkdir -p /ifs/scratch/c2b2/ac_lab/st3179/picard/picard/pipeline_output/scifi_L001/scifi_L001_C50/

2) Lack of memory: this sounds more plausible to me. Are you giving enough memory to your job? Since you're not using SLURM you'd need to specify this to your job manager.

somnathtagore commented 3 years ago

ok. Now i gave 100G memory to the job. It seems now, the log file is read-only ! As far as i know i have all permissions set up in this directory.

Job started Mon Dec 7 13:29:12 EST 2020 running on hb4c3n5.hpc Traceback (most recent call last): File "/ifs/home/c2b2/ac_lab/st3179/bin/scifi", line 11, in load_entry_point('scifi', 'console_scripts', 'scifi')() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 487, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2728, in load_entry_point return ep.load() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2346, in load return self.resolve() File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/site-packages/pkg_resources/init.py", line 2352, in resolve module = import(self.module_name, fromlist=['name'], level=0) File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/build/lib/scifi/init.py", line 178, in _LOGGER = setup_logger() File "/ifs/scratch/c2b2/ac_lab/st3179/scifiRNA-seq/build/lib/scifi/init.py", line 55, in setup_logger fh = logging.FileHandler(logfile) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/logging/init.py", line 1092, in init StreamHandler.init(self, self._open()) File "/ifs/home/c2b2/ac_lab/st3179/anaconda3/envs/my_pymc_env/lib/python3.7/logging/init.py", line 1121, in _open return open(self.baseFilename, self.mode, encoding=self.encoding) OSError: [Errno 30] Read-only file system: '/ifs/home/c2b2/ac_lab/st3179/.scifi.log.txt'

afrendeiro commented 3 years ago

Ok 100Gb is way too much, but it seems the current problem is yet again unrelated. The general logfile for the executable is in the home directory of the user. Are you user "st3179"? Why can't you write to that directory? I've made a commit that tries to get around this by if the home directory is not writable it will try to use the current directory.

somnathtagore commented 3 years ago

ok. Understood. One issue. When i change submission_command: "sbatch" from default.yaml, since out cluster supports qsub:

should i the submission_command: "qsub" ? or submission_command: "sh -e" ?

i am using qsub for submitting the job.

afrendeiro commented 3 years ago

You can pass any arbitrary command with options. The pipeline will the run cmd <job_file.sh>. E.g. sh -e <job_file.sh> or sbatch -p partitionname -c 4 <job_file.sh> if cmd is "sbatch -p partitionname -c 4". In your case you could for example pass "qsub -pe smp 4 -l h_vmem=60G" for running with 4 cpus and 60Gb memory.

I've actually never used SGE, I just saw what some options are here: http://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/how_to_submit_a_job_using_qsub.html

afrendeiro commented 3 years ago

Again you could bypass all this by just submitting the sh file as a job to SGE manually.

somnathtagore commented 3 years ago

Hi Andre,

I have to use qsub, otherwise i cannot allot memory to the jobs. When i do that: the jobs go into error mode,

15289521 0.01690 scifi_pipe st3179 Eqw 12/08/2020 00:04:15 1
15289525 0.01599 scifi_pipe st3179 Eqw 12/08/2020 00:04:15 1

(base) <0|1022>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ qstat -j 15289521 | grep error error reason 1: 12/08/2020 00:04:16 [3224:58494]: error: can't open stdout output file "/ifs/home/c2b2/ac_lab/st3179/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289521": Read-only file system (base) <0|1023>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$ qstat -j 15289525 | grep error error reason 1: 12/08/2020 00:04:16 [3224:24771]: error: can't open stdout output file "/ifs/home/c2b2/ac_lab/st3179/scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289525": Read-only file system (base) <0|1024>st3179@hb1c6n8:/ifs/scratch/c2b2/ac_lab/st3179/picard/picard$

Is there any way to create these files (scifi_pipeline.scifi_L002.map.scifi_L001_C96.sh.o15289521 etc.) in the current directory, rather than the home directory, as you did for .scifi.log.txt ?

afrendeiro commented 3 years ago

I'm not sure I can help you, I don't know what those files are. You can control where the output of the run is using the --output-dir option. Do scifi map --help to see all options.

somnathtagore commented 3 years ago

Hi Andre, thanks! I am still trying to figure out the issue. By any chance, do you have a toy example dataset, that i can use to test whether the pipeline is running properly or not. In that case, it would be easy to figure out whether there is some issue with the dataset formatting steps. thanks!

somnathtagore commented 3 years ago

Hi André, The pipeline is running now. Thanks!