Hi Mitchell!
We've tried to run FIRE on targeted seq data, and pipeline is failing with "polars.exceptions.NoDataError: empty CSV", since fiber-locations-shuffled.bed.gz is created empty.
The bed file with complement to targeted regions was used for exclusion in filtered_and_shuffled_fiber_locations_chromosome.
What could be an issue in our usage of FIRE? Is it suitable for such a task?
Config yaml:
ref: /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa
ref_name: hg38
n_chunks: 1 # split bam file across x chunks
max_t: 4 # use X threaeds per chunk
manifest: config/config_targeted_project.tbl # table with samples to process
keep_chromosomes: chr4 # only keep chrs matching this regex.
keep_chromosomes: chr7
keep_chromosomes: chr20
## Force a read coverage instead of calulating it genome wide from the bam file.
## This can be useful if only a subset of the genome has reads.
#force_coverage: 50
## regions to not use when identifying null regions that should not have RE, below are the defaults auto used for hg38.
excludes:
- workflow/annotations/hg38.fa.sorted.bed
#- workflow/annotations/hg38.gap.bed.gz
#- workflow/annotations/SDs.merged.hg38.bed.gz
## you can optionally specify a model that is not the default.
# model: models/my-custom-model.dat
##
## only used if training a new model
##
# train: True
# dhs: workflow/annotations/GM12878_DHS.bed.gz # regions of suspected regulatory elements
Example of error log:
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=204800, mem_mib=195313, disk_mb=4096, disk_mib=3907, time=100440, gpus=0
Select jobs to execute...
[Thu Dec 28 16:04:53 2023]
rule fdr_table:
input: results/bc2031/fiber-calls/FIRE.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz, /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
output: results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
jobid: 0
reason: Forced execution
wildcards: sm=bc2031
threads: 8
resources: mem_mb=204800, mem_mib=195313, disk_mb=4096, disk_mib=3907, tmpdir=/tmp, time=100440, gpus=0
python /home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py -v 1 results/bc2031/fiber-calls/FIRE.bed.gz results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai -s results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz -o results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
Activating conda environment: ../../../../../../../home/nshaikhutdinov/FIRE/env/72529d38651d38b3fc44b5aae6fe7a22_
[INFO][Time elapsed (ms) 1068]: Reading FIRE file: results/bc2031/fiber-calls/FIRE.bed.gz
/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py:486: DeprecationWarning: `the argument comment_char` for `read_csv` is deprecated. It has been renamed to `comment_prefix`.
fire = pl.read_csv(
[INFO][Time elapsed (ms) 1082]: Reading genome file: /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
[INFO][Time elapsed (ms) 1085]: Reading fiber locations file: results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz
[INFO][Time elapsed (ms) 1095]: Reading shuffled fiber locations file: results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz
Traceback (most recent call last):
File "/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py", line 539, in <module>
defopt.run(main, show_types=True, version="0.0.1")
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/defopt.py", line 356, in run
return call()
^^^^^^
File "/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py", line 517, in main
shuffled_locations = pl.read_csv(
^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/utils/deprecation.py", line 100, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/io/csv/functions.py", line 369, in read_csv
df = pl.DataFrame._read_csv(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/dataframe/frame.py", line 784, in _read_csv
self._df = PyDataFrame.read_csv(
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.NoDataError: empty CSV
[Thu Dec 28 16:04:54 2023]
Error in rule fdr_table:
jobid: 0
input: results/bc2031/fiber-calls/FIRE.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz, /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
output: results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
conda-env: /home/nshaikhutdinov/FIRE/env/72529d38651d38b3fc44b5aae6fe7a22_
shell:
python /home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py -v 1 results/bc2031/fiber-calls/FIRE.bed.gz results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai -s results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz -o results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Index(['bc2029', 'bc2031', 'bc2025', 'bc2027', 'bc2026', 'bc2032', 'bc2030',
'bc2028'],
dtype='object', name='sample')
Hi Mitchell! We've tried to run FIRE on targeted seq data, and pipeline is failing with "polars.exceptions.NoDataError: empty CSV", since fiber-locations-shuffled.bed.gz is created empty.
The bed file with complement to targeted regions was used for exclusion in filtered_and_shuffled_fiber_locations_chromosome.
What could be an issue in our usage of FIRE? Is it suitable for such a task?
Config yaml:
Example of error log:
Exclusion bed file: