Closed neekonsu closed 4 years ago
Hi Neekon, thanks for reaching out.
This may be a file name/path issue. The relevant error from your output is:
Error: The requested file (./example_chr22/ABC_output/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted) could not be opened. Error message: (No such file or directory). Exiting!
Could you confirm your pipeline outputs a narrowPeak file with this exact name and path? In the README example, this file is located in the Peaks/ subdirectory, so perhaps that is the issue.
Hey @jnasser3, thanks for your speedy response! :)
I overlooked the missing /Peaks/
in that printout, and indeed there was a typo in one of my shell files that omitted the directory from the makeCandidateRegions.py
call.
I gave the program another go, and interestingly, that mistake doesn't seem to be what was holding up the pipeline. Take a look at the new printout, which largely overlaps with the last.
My thoughts:
Looking in ABC-Output/Peaks
, my pipeline seems to be generating files that begin with the shell variable symbol, eg: '$FILEBASENAME.bam.{...}'
, so there may be some ill syntax at play in my shell scripts, I am currently working on that.
After examining the files that makeCandidateRegions.py
's subprocesses failed to analyse, it seems that my shell syntax errors are causing the bedtools
and macs2
output to be written elsewhere than where makeCandidateRegions.py
expects it, so I imagine my errors are really just borne (no pun intended) from the shell problems. I'll keep you posted just in case this doesn't solve the issues, but I don't get the feeling that this is an ABC-inherent bug.
Running: awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' ./example_chr22/reference/chr22 <(samtools view -H ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | grep SQ | cut -f 2 | cut -c 4- ) > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
Running: bedtools sort -faidx ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order -i ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted | bedtools coverage -g ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order -counts -sorted -a stdin -b ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' | bedtools sort -faidx ./example_chr22/reference/chr22 -i stdin > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed; rm ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
Fast count method failed to count: ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
Trying bamtobed method ...
Running: bedtools bamtobed -i ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | cut -f 1-3 | bedtools intersect -wa -a stdin -b ./example_chr22/reference/chr22.bed | bedtools sort -i stdin -faidx ./example_chr22/reference/chr22 | bedtools coverage -g ./example_chr22/reference/chr22 -counts -sorted -a ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted -b stdin | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed
No columns to parse from file
b''
Traceback (most recent call last):
File "./src/makeCandidateRegions.py", line 65, in <module>
main(args)
File "./src/makeCandidateRegions.py", line 61, in main
processCellType(args)
File "./src/makeCandidateRegions.py", line 48, in processCellType
outdir = args.outDir)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/peaks.py", line 27, in make_candidate_regions_from_summits
run_count_reads(accessibility_file, raw_counts_outfile, macs_peaks, genome_sizes, use_fast_count=True)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 302, in run_count_reads
count_bam(target, bed_file, output, genome_sizes=genome_sizes, use_fast_count=use_fast_count)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 368, in count_bam
if ("terminated" not in err) and ("Error" not in err) and ("ERROR" not in err) and any(data):
UnboundLocalError: local variable 'data' referenced before assignment
exit status 1
. . . Moving on to next stage, please wait . . .
Namespace(ATAC='', DHS='dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.chr22.bam,dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.chr22.bam', H3K27ac='dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Chromatin/ENCFF384ZZM.chr22.bam', candidate_enhancer_regions='./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.candidateRegions.bed', cellType='K562', chrom_sizes='./example_chr22/reference/chr22', default_accessibility_feature=None, enhancer_class_override=None, expression_table='dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt', gene_name_annotations='symbol', genes='./example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.chr22.bed', genes_for_class_assignment=None, outdir='./example_chr22/ABC_output/Neighborhoods/', primary_gene_identifier='symbol', qnorm=None, skip_gene_counts=False, skip_rpkm_quantile=False, supplementary_features=None, tss_slop_for_class_assignment=500, ubiquitously_expressed_genes='./src/../reference/UbiquitouslyExpressedGenesHG19.txt', use_secondary_counting_method=False)
Using gene expression from files: ['dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt']
[Errno 2] File dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt does not exist: 'dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt'
Traceback (most recent call last):
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 39, in load_genes
expr = pd.read_table(expression_table, names=[primary_id, name + '.Expression'])
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 673, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] File dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt does not exist: 'dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt'
Failed on dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Expression/K562.ENCFF934YBO.TPM.txt
Running command: bedtools sort -faidx ./example_chr22/reference/chr22 -i ./example_chr22/ABC_output/Neighborhoods/GeneList.TSS1kb.bed > ./example_chr22/ABC_output/Neighborhoods/GeneList.TSS1kb.bed.sorted; mv ./example_chr22/ABC_output/Neighborhoods/GeneList.TSS1kb.bed.sorted ./example_chr22/ABC_output/Neighborhoods/GeneList.TSS1kb.bed
Loading coverage from pre-calculated file for Genes.H3K27ac.ENCFF384ZZM.chr22.bam
Usage: samtools idxstats [options] <in.bam>
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-@, --threads INT
Number of additional threads to use [0]
Traceback (most recent call last):
File "./src/run.neighborhoods.py", line 97, in <module>
main(args)
File "./src/run.neighborhoods.py", line 93, in main
processCellType(args)
File "./src/run.neighborhoods.py", line 74, in processCellType
outdir = args.outdir)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 89, in annotate_genes_with_features
genes = count_features_for_bed(genes, bounds_bed, genome_sizes, features, outdir, "Genes", force=force, use_fast_count=use_fast_count)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 415, in count_features_for_bed
df = count_single_feature_for_bed(df, bed_file, genome_sizes, feature_bam, feature, directory, filebase, skip_rpkm_quantile, force, use_fast_count)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 438, in count_single_feature_for_bed
total_counts = count_total(feature_bam)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 528, in count_total
total_counts = count_bam_mapped(infile)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 503, in count_bam_mapped
data = check_output(command, shell=True)
File "/usr/bin/anaconda/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/usr/bin/anaconda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'samtools idxstats dirname ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/Chromatin/ENCFF384ZZM.chr22.bam' returned non-zero exit status 1.
exit status 1
. . . Moving on to next stage, please wait . . .
reading genes
reading enhancers
Making predictions for chromosome: chr22
Making putative predictions table...
Using: ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/HiC/raw/chr22/chr22.VCobserved.gz
Begin HiC
Loading HiC
Traceback (most recent call last):
File "./src/predict.py", line 148, in <module>
main()
File "./src/predict.py", line 107, in main
this_chr = make_predictions(chromosome, this_enh, this_genes, args)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/predictor.py", line 17, in make_predictions
pred = add_hic_to_enh_gene_table(enhancers, genes, pred, hic_file, hic_norm_file, hic_is_vc, chromosome, args)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/predictor.py", line 66, in add_hic_to_enh_gene_table
gamma = args.hic_gamma)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/hic.py", line 42, in load_hic
HiC_sparse_mat = hic_to_sparse(hic_file, hic_norm_file, hic_resolution)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/hic.py", line 150, in hic_to_sparse
header=None, engine='c', memory_map=True)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/bin/anaconda/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 613, in pandas._libs.parsers.TextReader._setup_parser_source
File "/usr/bin/anaconda/lib/python3.7/gzip.py", line 163, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
NotADirectoryError: [Errno 20] Not a directory: './example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam/HiC/raw/chr22/chr22.VCobserved.gz'
exit status 1
@jnasser3 Hi again!
So I have made some progress, and there were indeed some more sneaky formatting errors where I was passing variables between stages of the pipeline.
There now only remains one error that I cannot explain, and it is in the makeCandidateRegions.py
printout:
Verifying arguments for 'makeCandidateRegions.py'
——————————————————————
./src/makeCandidateRegions.py
src/makeCandidateRegions.py
——————————————————————
./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted
example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted
——————————————————————
./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
——————————————————————
./example_chr22/ABC_output/Peaks/
example_chr22/ABC_output/Peaks/
——————————————————————
./example_chr22/reference/chr22
example_chr22/reference/chr22
——————————————————————
./reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
reference/wgEncodeHg19ConsensusSignalArtifactRegions.bed
——————————————————————
./example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
example_chr22/reference/RefSeqCurated.170308.bed.CollapsedGeneBounds.TSS500bp.chr22.bed
——————————————————————
Running: awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' ./example_chr22/reference/chr22 <(samtools view -H ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | grep SQ | cut -f 2 | cut -c 4- ) > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
Running: bedtools sort -faidx ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order -i ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted | bedtools coverage -g ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order -counts -sorted -a stdin -b ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' | bedtools sort -faidx ./example_chr22/reference/chr22 -i stdin > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed; rm ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
Fast count method failed to count: ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
Trying bamtobed method ...
Running: bedtools bamtobed -i ./example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam | cut -f 1-3 | bedtools intersect -wa -a stdin -b ./example_chr22/reference/chr22.bed | bedtools sort -i stdin -faidx ./example_chr22/reference/chr22 | bedtools coverage -g ./example_chr22/reference/chr22 -counts -sorted -a ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted -b stdin | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' > ./example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed
No columns to parse from file
b''
Traceback (most recent call last):
File "./src/makeCandidateRegions.py", line 65, in <module>
main(args)
File "./src/makeCandidateRegions.py", line 61, in main
processCellType(args)
File "./src/makeCandidateRegions.py", line 48, in processCellType
outdir = args.outDir)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/peaks.py", line 27, in make_candidate_regions_from_summits
run_count_reads(accessibility_file, raw_counts_outfile, macs_peaks, genome_sizes, use_fast_count=True)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 302, in run_count_reads
count_bam(target, bed_file, output, genome_sizes=genome_sizes, use_fast_count=use_fast_count)
File "/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction/src/neighborhoods.py", line 368, in count_bam
if ("terminated" not in err) and ("Error" not in err) and ("ERROR" not in err) and any(data):
UnboundLocalError: local variable 'data' referenced before assignment
exit status 1
Notice how bedtools bamtobed ...
returns: No columns to parse from file
in reference to ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
.
Looking at the output of file
on wgEncodeUwDnaseK562AlnRep1.chr22.bam
, we see that this file is in a compressed format:
file ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam
ABC-Enhancer-Gene-Prediction/example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam: gzip compressed data, extra field, original size 0
The help page for bedtools bamtobed doesn't indicate that the program is equipped to decompress input files, so I wonder if this issue could be more widespread than my pipeline development. I will try decompressing and seeing what makeCandidateRegions.py
says, I appreciate your help.
Please let me know if the input files in the example data are meant to be compressed, or if I should decompress them before use.
Thank you,
Neekon Saadat
you shouldn't have to alter (decompress etc) the .bam files - it should just work out of the box.
What is the actual bedtools output/error if you just run the bedtools commands alone? Can you determine at what point in the bedtools commands the error is occuring?
@jnasser3 This is a little strange:
(base) root@d788ddf57c51:/go/src/github.com/neekonsu# cd ABC-Enhancer-Gene-Prediction/
(base) root@d788ddf57c51:/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction# awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' \
example_chr22/reference/chr22 <(samtools view -H example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
| grep SQ | cut -f 2 | cut -c 4- ) > \
example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
(base) root@d788ddf57c51:/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction# bedtools sort -faidx example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order \
-i example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted \
| bedtools coverage -g example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order -counts -sorted -a stdin \
-b example_chr22/input_data/Chromatin/wgEncodeUwDnaseK562AlnRep1.chr22.bam \
| awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' | bedtools sort -faidx example_chr22/reference/chr22 -i stdin > \
example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed; \
rm example_chr22/ABC_output/Peaks/wgEncodeUwDnaseK562AlnRep1.chr22.macs2_peaks.narrowPeak.sorted.wgEncodeUwDnaseK562AlnRep1.chr22.bam.Counts.bed.temp_sort_order
(base) root@d788ddf57c51:/go/src/github.com/neekonsu/ABC-Enhancer-Gene-Prediction#
It seems as though the commands that makeCandidateRegions.py
deploys function properly when I execute them manually.
My intuition would tell me to look for a discrepancy between what I plugged into the shell just now and what makeCandidateRegions.py
printed out in the pipeline, however I actually copied these commands from makeCandidateRegions.py
's printout in the first place, so they should be the exact same. Maybe some terminal-voodoo magic has gotten grip on the pipeline? :)
By the way, since the time I opened this issue, I have rooted out all of the incorrect filepaths that I was previously passing to the pipeline. Now that the paths are no longer the issue, I wonder what could be causing the bedtools errors. My hunch is that there is some subtle technicality with how file permissions or command execution works in python that is causing the same command to fail in the script yet work when executed manually.
The lack of error output and weirdness around bash might point to more memory limitation problems. I had a lot of trouble with this recently, as well.
You could edit the python to add a print statement for the exit code to help figure that out. I believe if it's killed by the out-of-memory it will have an exit code of 9.
One way to test this might be to aggressively downsample the .bam before running the python code.
On Mon, Jul 27, 2020 at 1:15 PM Neekon Saadat notifications@github.com wrote:
By the way, since the time I opened this issue, I have rooted out all of the incorrect filepaths that I was previously passing to the pipeline. Now that the paths are no longer the issue, I wonder what could be causing the bedtools errors. My hunch is that there is some subtle technicality with how file permissions or command execution works in python that is causing the same command to fail in the script yet work when executed manually.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/issues/25#issuecomment-664525607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADTPUY7GRAOYICCTXTJMQLR5WY2JANCNFSM4PD3T2QQ .
@thouis the errors here are only cropping up on the example bam which is chr22 only - so its already very small
@neekonsu ok, I'm glad it works on the command line. Not quite sure what the issue is since I'm not familiar with how this pipelining code works. You should also check to make sure the pipeline 'sees' the .bai index file, which is not an input to the function but needs to be there (perhaps not for this bedtools command, but for subsequent parts of the pipeline)
@thouis @jnasser3 I figured it out by poking around through the sourcecode that the count_bam
function in neighborhoods.py
had referenced an undefined variable data
in 368
, which I assumed by context to mean stderrdata
instead. After changing the code, this segment of the pipeline began functioning properly, however a new function in part 2
of the pipeline (referring to how you broke it up in the readme) ran into a new, but much more debuggable problem. I detail this problem in a new issue.
Thank you for your patience and help, much appreciated.
I'm guessing this would be fixed by merging: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/pull/24
Hi there!
I am working on a pipeline wrapper for this repository at https://github.com/neekonsu/alphabet in order to facilitate the use of this analytical tool. I am currently running into some errors trying to run the prescribed commands from the readme, and I would be hugely appreciative of any help sorting them out. The following is the printout when running the pipeline on the example data herein. I have omitted the MACS2 output since it seems to be functioning properly. If you would like to try this out for yourselves, I have dockerized my testing environment, and it is located at https://hub.docker.com/repository/docker/neekonsu/alphabet/general if you would like to try out the pipeline.
In order to replicate my results, open the container interactively and type
./alphabet/run.sh
into the terminal. Just type "default" or click enter for each of the 12 prompts until MACS2 begins, and good luck!Thanks,
Neekon Saadat