Open Mahmoudbassuoni opened 2 years ago
Received a similar comment from another user. The issue in that case was some of the paths weren't correctly accessible inside the container. Please check that all paths are accessible inside the container: including BAM file(s), reference fasta, and Neural Network model paths.
Beside this, suggest running on a single chromsome to make sure that case works.
Hi @anands-repo , I decided to go out of the container and ran it again with single chromosomes but still the same thing. I ran it before with different dataset and it was fine not sure what could be wrong.
I pushed a new repo under branch repo
. It is under development, but I have tested it for Illumina reads for individual chromosomes and it works. It goes with the docker image oddjobs/hello_deps
. This branch and docker image have the following features which may help with debugging issues of the type you are facing:
Algorithms and models are the same as before
PS: workdir will need cleanup before each rerun
Thanks @anands-repo , will give it a try and see what happens....
Hello,
I'm running into a similar problem of having VCF results without any calls in them. The program looks to be proceeding normally when I run it inside the docker container:
2022-08-26 19:41:50,599 INFO:Launching hotspot detection for chromosome chr16
Hotspots, chromosome chr16: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 501/501 [03:50<00:00, 2.17it/s]
2022-08-26 19:45:41,115 INFO:Combining all hotspots and sharding
2022-08-26 19:45:42,965 INFO:Completed sharding, creating caller commands for running NN
2022-08-26 19:45:42,966 INFO:Created call commands for chromosome chr16
2022-08-26 19:45:42,966 INFO:Finished hotspot generation for all chromosomes. Launching all caller commands.
Caller progress: 0it [00:00, ?it/s]
2022-08-26 19:45:42,967 INFO:Completed runs, checking log files
2022-08-26 19:45:42,967 INFO:Completed running all caller commands correctly, preparing vcf
VCF prep: 0it [00:00, ?it/s]
sort -k1,1d -k2,2n
2022-08-26 19:45:43,010 INFO:Completed runs. Results in local/results.output.vcf
But the VCF file does not contain any calls. I've checked that the BAM is readable inside the container, so I don't think that's the problem. I've also tried to install the software outside of the docker container to be sure, but I'm running into many install issues - would it be possible to make the Dockerfile available so I could see the necessary requirements for installation?
Hi,
Please find the information below
Dockerfile:
FROM ubuntu:bionic
# Initial installations
RUN apt-get -y update && apt-get -y install wget build-essential vim tmux vcftools bedtools
# Path where everything happens
WORKDIR /opt
# Install miniconda3
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
RUN bash Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -p /opt/miniconda3
ENV PATH=/opt/miniconda3/bin:$PATH
# Install boost with boost::python
COPY user-config.jam /root/boost_1_69_0/tools/build/src/user-config.jam
RUN wget https://boostorg.jfrog.io/artifactory/main/release/1.69.0/source/boost_1_69_0.tar.gz && \
tar xvfz boost_1_69_0.tar.gz && cd boost_1_69_0 && ./bootstrap.sh --prefix=/opt/boost && \
./b2 install && rm -rf boost_1_69_0*
# Install conda packages
RUN conda install pytorch==1.2.0 torchvision==0.4.0 -c pytorch && \
conda install -y -c bioconda pysam pybedtools && \
pip install pyvcf scikit-bio && \
conda install -y biopython cmake h5py intervaltree more-itertools scikit-learn matplotlib tqdm
# Additional environment settings (weird that it doesn't work by itself)
RUN ldconfig
ENV LD_LIBRARY_PATH=/opt/boost/lib:$LD_LIBRARY_PATH
# Cleanup
RUN rm -rf /tmp/*
The Dockerfile uses the following user-config.jam file:
using python
: 3.7
: /opt/miniconda3/bin/python
: /opt/miniconda3/include/python3.7m
;
Regarding the issue - from the logs, it looks like the initial heuristic that filters reads, and shortlists putative loci to run the Neural Network on, doesn't return anything. The following script can be run standalone to see what's happening:
python \
python/HotspotDetectorDVFiltered.py \
--bam BAMFILE \
--ref REFERENCE \
--region chr16 \
--mapq_threshold 5 \
--output /tmp/hotspots.json > /tmp/hotspots_logs.log 2>&1
Thanks for your response! I'm still trying to get a local install working for further testing based on your Dockerfile, but I'm running into some issues with conda permissions based on the install location. In the meantime, I tried out the hotspot command you shared from within the available docker container. Even if I run it using the --debug
flag, all I get in the log file is this:
2022-08-29 22:12:32,644 Started script
2022-08-29 22:16:40,987 Completed running the script
Should I be seeing additional messages?
Yes if --debug
flag is used there will be a lot of text displayed (or in this case written to the log files). Did you notice any output in the /tmp/hotspots.json
file, or is it still empty?
If it is empty, my best guess is that the reads are being filtered off at the quality control stage and not being used in any analysis. These are the read quality control filters:
usable = True;
usable = usable and not alignment.is_unmapped; # Remove unmapped reads
usable = usable and not (alignment.is_secondary or alignment.is_supplementary); # Remove secondary or supplementary alignments
usable = usable and not alignment.is_duplicate; # Remove duplicate reads
usable = usable and (not alignment.is_paired or alignment.is_proper_pair); # Remove improper pairs in the case of paired alignments
usable = usable and (alignment.mapping_quality > 0); # Remove 0-mapping quality reads
Hi, @anands-repo I am trying to run a variant calling using the illumina model but I am finally getting files with not even one call, I don't know what's wrong. I am using the same code I used before but for another bam file, every time it ends up with this message.
Computers / CPU cores / Max jobs to run 1:local / 16 / 1 ETA: 0s Left: 0 AVG: 0.00s 0 INFO:root:Completed runs, checking log files INFO:root:All commands completed correctly, preparing vcf Search string /tmp/vcftemp/*expert0.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert1.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*expert2.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.ngs_tgs.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*best.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.best.tmp.vcf'] sort -k1,1d -k2,2n Search string /tmp/vcftemp/*mean.vcf Found 0 files Running command ['vcf-sort', '/home/bassyouni/Documents/VC/hello/results.mean.tmp.vcf'] sort -k1,1d -k2,2n Choice histogram = {0: 0, 1: 0, 2: 0}