Closed shiying-sxu closed 2 years ago
RU doesn't require CreateTensorFullAlignment
and we are unable to pinpoint the problem per your description. We suggest you check the output of each step to ensure they are intact.
The result of my running is as follows, can you help me to see what is wrong?
This is WhatsHap 1.2.1 running under Python 3.6.10 Working on 1 samples from 1 family Leaving chromosome 'chr1' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr2' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr3' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr4' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr5' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr6' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr7' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr8' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr9' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr10' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr11' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr12' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr13' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr14' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr15' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr16' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr17' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr18' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr19' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr20' unchanged (present in VCF but not requested by option --chromosome) Leaving chromosome 'chr21' unchanged (present in VCF but not requested by option --chromosome) ======== Working on chromosome 'chr22' ---- Processing individual HG001 Using maximum coverage per sample of 15X Number of variants skipped due to missing genotypes: 0 Number of remaining heterozygous variants: 24447 Reading alignments and detecting alleles ... Found 84382 reads covering 24442 variants Kept 66806 reads that cover at least two variants each Reducing coverage to at most 15X by selecting most informative reads ... Selected 18985 reads covering 24437 variants Variants covered by at least one phase-informative read in at least one individual after read selection: 24437 Phasing 1 sample by solving the MEC problem ... MEC cost: 358080 No. of phased blocks: 310 Largest block contains 9714 variants (39.8% of accessible variants) between position 30263536 and 42499006 ======== Writing VCF Done writing VCF Changed 120 genotypes while writing VCF Leaving chromosome 'chrX' unchanged (present in VCF but not requested by option --chromosome)
== SUMMARY == Maximum memory usage: 0.417 GB Time spent reading BAM/CRAM: 82.1 s Time spent parsing VCF: 57.8 s Time spent selecting reads: 8.1 s Time spent phasing: 125.4 s Time spent writing VCF: 48.8 s Time spent finding components: 4.3 s Time spent on rest: 32.2 s Total elapsed time: 358.6 s Found 1 sample(s) in input VCF Found 1 sample(s) in BAM file Reading alignments and detecting alleles ... Found 84015 reads covering 24143 variants
== SUMMARY == Total alignments processed: 431114 Alignments that could be tagged: 83055 Alignments spanning multiple phase sets: 0 Finished in 211.6 s [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [faidx] Truncated sequence: chr22:46903593-50973515 [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files [mpileup] 1 samples in 1 input files cat: '/work/Clair3-main/data/outputref-HG001_GRCh38/vcfoutput/vcf*': 没有那个文件或目录 [WARNING] No vcf file found, please check the setting [WARNING] No variant found, please check the setting
The command I execute is as follows
CLAIR3="/work/Clair3-main/clair3.py" # clair3.py PYPY="/home/user/anaconda3/envs/clair3/bin/pypy3.6" # e.g. pypy3 WHATSHAP="/home/user/anaconda3/envs/clair3/bin/whatshap" # e.g. whatshap PARALLEL="/home/user/anaconda3/envs/clair3/bin/parallel" # e.g. parallel TABIX="/home/user/anaconda3/envs/clair3/bin/tabix" # e.g. tabix SAMTOOLS="/home/user/anaconda3/envs/clair3/bin/samtools" # e.g. samtools
PLATFORM="ont" # e.g. {ont, hifi, ilmn} VCF_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.vcf.gz" # [YOUR_VCF_FILE_PATH]e.g. hg003.vcf.gz BAM_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.bam" # e.g. hg003.bam alignment.bam REFERENCE_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.fa" # e.g. hg003.fasta BED_FILE_PATH="/work/Clair3-main/data/datatest/HG001/GRCh38/GRCh38.bed" # e.g. hg003.bed OUTPUT_DIR="/work/Clair3-main/data/outputref-HG001_GRCh38" # e.g. output
CHR_PREFIX="chr"
CHR=(22)
THREADS=24
chunk_num=15
CHUNK_LIST=seq 1 ${chunk_num}
MIN_AF=0.08
SPLIT_BED_PATH="${OUTPUT_DIR}/split_beds" VCF_OUTPUT_PATH="${OUTPUT_DIR}/vcf_output" VAR_OUTPUT_PATH="${OUTPUT_DIR}/var" PHASE_VCF_PATH="${OUTPUT_DIR}/phased_vcf" PHASE_BAM_PATH="${OUTPUT_DIR}/phased_bam"
mkdir -p ${SPLIT_BED_PATH} mkdir -p ${VCF_OUTPUT_PATH} mkdir -p ${VAR_OUTPUT_PATH} mkdir -p ${PHASE_VCF_PATH} mkdir -p ${PHASE_BAM_PATH}
cd ${OUTPUT_DIR}
${WHATSHAP} unphase ${VCF_FILE_PATH} > ${OUTPUT_DIR}/INPUT.vcf.gz
${PARALLEL} --joblog ${PHASE_VCF_PATH}/phase.log -j${THREADS} \ "${WHATSHAP} phase \ --output ${PHASE_VCFPATH}/phased{1}.vcf.gz \ --reference ${REFERENCE_FILE_PATH} \ --chromosome ${CHR_PREFIX}{1} \ --ignore-read-groups \ --distrust-genotypes \ ${OUTPUT_DIR}/INPUT.vcf.gz \ ${BAM_FILE_PATH}" ::: ${CHR[@]}
${PARALLEL} -j ${THREADS} tabix -p vcf ${PHASE_VCFPATH}/phased{1}.vcf.gz ::: ${CHR[@]}
${PARALLEL} --joblog ${PHASE_BAM_PATH}/haplotag.log -j${THREADS} \ "${WHATSHAP} haplotag \ --output ${PHASE_BAM_PATH}/{1}.bam \ --reference ${REFERENCE_FILE_PATH} \ --regions ${CHR_PREFIX}{1} \ --ignore-read-groups \ ${PHASE_VCFPATH}/phased{1}.vcf.gz \ ${BAM_FILE_PATH}" ::: ${CHR[@]}
${PARALLEL} --joblog ${PHASE_BAM_PATH}/index.log -j ${THREADS} ${SAMTOOLS} index -@12 ${PHASE_BAM_PATH}/{1}.bam ::: ${CHR[@]}
${PARALLEL} --joblog ${SPLIT_BED_PATH}/split_extend_bed.log -j${THREADS} \ "${PYPY} ${CLAIR3} SplitExtendBed \ --bed_fn ${BED_FILE_PATH} \ --output_fn ${SPLIT_BED_PATH}/{1} \ --ctgName ${CHR_PREFIX}{1}" ::: ${CHR[@]}
${PARALLEL} --joblog ${VAR_OUTPUT_PATH}/get_truth.log -j${THREADS} \ "${PYPY} ${CLAIR3} GetTruth \ --vcf_fn ${PHASE_VCFPATH}/phased{1}.vcf.gz \ --ctgName ${CHR_PREFIX}{1} \ --var_fn ${VAR_OUTPUTPATH}/var{1}" ::: ${CHR[@]}
${PARALLEL} --joblog ${OUTPUT_DIR}/unify_repre.log -j${THREADS} \ "${PYPY} ${CLAIR3} UnifyRepresentation \ --bam_fn ${PHASE_BAM_PATH}/{1}.bam \ --var_fn ${VAR_OUTPUTPATH}/var{1} \ --ref_fn ${REFERENCE_FILE_PATH} \ --bed_fn ${BED_FILE_PATH} \ --extend_bed ${SPLIT_BED_PATH}/{1} \ --output_vcf_fn ${VCF_OUTPUTPATH}/vcf{1}_{2} \ --min_af ${MIN_AF} \ --chunk_id {2} \ --chunk_num ${chunk_num} \ --platform ${PLATFORM} \ --ctgName ${CHR_PREFIX}{1}" ::: ${CHR[@]} ::: ${CHUNK_LIST[@]} > ${OUTPUT_DIR}/RU.log
cat ${VCF_OUTPUTPATH}/vcf* | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_DIR}/unified.vcf bgzip -f ${OUTPUT_DIR}/unified.vcf tabix -f -p vcf ${OUTPUT_DIR}/unified.vcf.gz
Still no hint for the empty output. Could you check each file's validity in step 5? We speculated that some files are empty or missing in your step 5.
Still no hint for the empty output. Could you check each file's validity in step 5? We speculated that some files are empty or missing in your step 5.
After I reinstalled the environment, it can run correctly, I think it should be an environment problem. In addition, I have a problem that when I try to run quickdemo, I get an error no module named libclair3, and I can't install it correctly with the command conda install libclair3
After I reinstalled the environment, it can run correctly, I think it should be an environment problem. In addition, I have a problem that when I try to run quickdemo, I get an error no module named libclair3, and I can't install it correctly with the command conda install libclair3
libclair3 is built via this command(source activate clair3 && make PREFIX=${CONDA_PREFIX}
) in installation option 4.
Hi, I met a problem in the step of representation unification. After I execute the command according to representation_unification.md, my vcf output folder is empty. I carefully check the steps and find that the CreateTensorFullAlignment command is missing in step 4 in the representation_unification.md file provided now, but after I add this command Found that the files in my CANDIDATE_DETAILS_PATH are all empty. Can you help me? Sincerely