broadinstitute / DirectHRD

Other
3 stars 1 forks source link

Convenient scripts #1

Open fo40225 opened 3 months ago

fo40225 commented 3 months ago

Can you provide convenient commands for execution?

Such as

git clone https://github.com/broadinstitute/DirectHRD
cd DirectHRD
pip install -r requirements.txt
python download_deps.py
python DirectHRD.py --genome hg38 -o output.txt input.vcf

Additionally, can I use a GATK Mutect2 tumor-only VCF as input?

Thank you for your research.

ruolin commented 3 months ago

Hi @fo40225, thanks for the suggestion. I will work on that. For your question, Mutect2 tumor-only VCF may not be the best input for DirectHRD because germline mutations could potentially slip through. The better the Indel calling, the more accurate DirectHRD is. May I ask what is your use case?

fo40225 commented 3 months ago

Mutect2's post-filter can eliminate germline and other artifacts.

The samples used are blood samples, primarily for studying rare mutations in the germline. Fastq files are analyzed for somatic mutations to investigate potentially lower proportions of mosaic variations as well. During this process, tumor-related statistics such as MSI, TMB, and HRD are calculated and considered together.

scarHRD requires a control sample and is implemented in R. HRDetect is also implemented in R. Managing package versions in R is a disaster. It's preferable that your requirements.txt specifies fixed versions (e.g., ==) for reproducibility at any future time.

ruolin commented 3 months ago

@fo40225 I just make it to a python package which can be installed by pip. Let me know if it works for you.

After git clone, please use cd DirectHRD && pip install . in linux to install. After installation, you can run hrd-classifier indel_vcfs_input_folder.

In fact, you can use any Indel callers and the only requirement is the VCF input format. However, I do recommend filtering any VCFs using low_complexity filter (written in the front page). Also, which reference genome version are you using, GRCh37 or GRCh38?

fo40225 commented 3 months ago

Result

sample  HRDscore    n_informative_del   pos_prob    neg_prob    mhdels  5del_m2 considered  del_2bp+    totl_del    frac_signal
SAMPLE_NAME 105.78  153 0.8193289019140544  0.1462368693482134  170 131 181 191 263 0.8453038674033149

Some Notes:

Due to SigProfilerExtractor depending on torch, the default installation of torch includes support for NVIDIA GPUs and depends on CUDA, which is a large package.

It is recommended that users pre-install torch according to their needs (CPU, ROCm for AMD GPUs, CUDA for NVIDIA GPUs). This can save network bandwidth and installation space.

For unknown reasons, your dependencies finally calculate that pandas requires version 1.5.5.

When I used Anaconda3-2024.06-1-Linux-x86_64.sh to install dependencies, it failed.

Therefore, I used Anaconda3-2023.03-1-Linux-x86_64.sh as the base environment.

sh Anaconda3-2023.03-1-Linux-x86_64.sh
eval "$(/home/user/anaconda3-DirectHRD/bin/conda shell.bash hook)"
git clone https://github.com/broadinstitute/DirectHRD
cd DirectHRD
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install .

Because your package versions are not fixed, I encountered this problem.

Traceback (most recent call last):
  File "/home/user/anaconda3-DirectHRD/bin/hrd-classifier", line 5, in <module>
    from HRD_classifier.main import main
  File "/home/user/anaconda3-DirectHRD/lib/python3.10/site-packages/HRD_classifier/__init__.py", line 1, in <module>
    from .main import main
  File "/home/user/anaconda3-DirectHRD/lib/python3.10/site-packages/HRD_classifier/main.py", line 10, in <module>
    from SigProfilerExtractor import SigProfilerPlottingMatrix as sppm
ImportError: cannot import name 'SigProfilerPlottingMatrix' from 'SigProfilerExtractor' (/home/user/anaconda3-DirectHRD/lib/python3.10/site-packages/SigProfilerExtractor/__init__.py)

I had to temporarily modify HRD_classifier/main.py, line 10.

#from SigProfilerExtractor import SigProfilerPlottingMatrix as sppm
from SigProfilerAssignment.DecompositionPlots import SigProfilerPlottingMatrix as sppm

Additionally, the following installations are required: https://github.com/AlexandrovLab/SigProfilerMatrixGenerator#using-python-interface

For offline installations behind a heavily restrictive firewall:

wget --no-passive-ftp ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/GRCh37.tar.gz
python
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> genInstall.install('GRCh37', offline_files_path='.')

Download the Low Complexity region

wget https://ftp.ncbi.nlm.nih.gov/giab/ftp/release/genome-stratifications/v3.0/GRCh37/LowComplexity/GRCh37_notinAllTandemRepeatsandHomopolymers_slop5.bed.gz

I typically use GRCh38, but it seems that the COSMIC ID database does not have a GRCh38 version. So, I regenerated my sample's VCF using b37 (GATK).

Below are the steps:

fastp --compression 2 \
 --thread 16 \
 --in1 SAMPLE_NAME_R1_001.fastq.gz \
 --in2 SAMPLE_NAME_R2_001.fastq.gz \
 --out1 SAMPLE_NAME_R1.clean.fastq.gz \
 --unpaired1 SAMPLE_NAME_R1.unpaired.fastq.gz \
 --out2 SAMPLE_NAME_R2.clean.fastq.gz \
 --unpaired2 SAMPLE_NAME_R2.unpaired.fastq.gz \
 --json SAMPLE_NAME.fastp.json \
 --html SAMPLE_NAME.fastp.html \
 --detect_adapter_for_pe \
 --dont_eval_duplication \
 --poly_g_min_len 4 \
 --disable_quality_filtering \
 --disable_length_filtering

export LD_LIBRARY_PATH=zlib-cloudflare/build-avx512
numactl --interleave=all bwa-mem2 mem -K 10000000 -Y -M \
 -R '@RG\tID:SAMPLE_NAME\tSM:SAMPLE_NAME\tPL:Illumina' \
 -t 95 /raid/bundle/b37/human_g1k_v37_decoy.fasta \
 SAMPLE_NAME_R1.clean.fastq.gz SAMPLE_NAME_R2.clean.fastq.gz | \
 numactl --interleave=all samtools view -@ 96 \
 --output-fmt-option level=2 --output-fmt bam \
 -o SAMPLE_NAME.bam

export JAVA_TOOL_OPTIONS='-XX:+UnlockDiagnosticVMOptions -XX:GCLockerRetryAllocationCount=96 -XX:+UseStringDeduplication -XX:+UseNUMA -XX:+UseG1GC'
export LD_LIBRARY_PATH=hadoop-3.3.1/lib/native
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk MarkDuplicatesSpark \
 --input SAMPLE_NAME.bam \
 --output SAMPLE_NAME.sorted.dedup.bam \
 --treat-unsorted-as-querygroup-ordered \
 --optical-duplicate-pixel-distance 2500 \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 -- \
 --spark-runner LOCAL --spark-master local[96] --conf spark.local.dir=./tmp --conf spark.port.maxRetries=61495

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseG1GC'
export LD_LIBRARY_PATH=hadoop-3.3.1/lib/native
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk BaseRecalibratorSpark \
 --input SAMPLE_NAME.sorted.dedup.bam \
 --known-sites /raid/bundle/b37/dbsnp_156_siteOnly.b37.vcf.gz \
 --known-sites /raid/bundle/b37/1000G_phase1.indels.b37.vcf.gz \
 --known-sites /raid/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz \
 --output SAMPLE_NAME.recal_data.grp \
 --reference /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 --read-index SAMPLE_NAME.sorted.dedup.bam.sbi \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 -- \
 --spark-runner LOCAL --spark-master local[96] --conf spark.local.dir=./tmp --conf spark.port.maxRetries=61495

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseParallelGC'
export LD_LIBRARY_PATH=hadoop-3.3.1/lib/native
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk ApplyBQSRSpark \
 --bqsr-recal-file SAMPLE_NAME.recal_data.grp \
 --input SAMPLE_NAME.sorted.dedup.bam \
 --output SAMPLE_NAME.sorted.dedup.recal.bam \
 --reference /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 --read-index SAMPLE_NAME.sorted.dedup.bam.sbi \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 -- \
 --spark-runner LOCAL --spark-master local[96] --conf spark.local.dir=./tmp --conf spark.port.maxRetries=61495

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseParallelGC'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk Mutect2 \
 -R /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 -I SAMPLE_NAME.sorted.dedup.recal.bam \
 -L Roche_KAPA_HyperExome_capture_targets.b37.bed \
 -O SAMPLE_NAME.vcf.gz \
 --read-index SAMPLE_NAME.sorted.dedup.recal.bam.bai \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 --germline-resource /raid/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz \
 --panel-of-normals /raid/bundle/Mutect2/Mutect2-WGS-panel-b37.vcf.gz \
 --f1r2-tar-gz SAMPLE_NAME.f1r2.tar.gz \
 --interval-padding 100 \
 --smith-waterman AVX_ENABLED \
 --pair-hmm-implementation AVX_LOGLESS_CACHING_OMP \
 --native-pair-hmm-threads 4

gatk VariantAnnotator \
 -D /raid/bundle/b37/dbsnp_156_siteOnly.b37.vcf.gz \
 -V SAMPLE_NAME.vcf.gz \
 -O SAMPLE_NAME.vcf
mv SAMPLE_NAME.vcf.gz.stats SAMPLE_NAME.vcf.stats

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk GetPileupSummaries \
 -R /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 -I SAMPLE_NAME.sorted.dedup.recal.bam \
 --interval-set-rule INTERSECTION \
 -L Roche_KAPA_HyperExome_capture_targets.b37.bed \
 -O SAMPLE_NAME.pileups.table \
 --read-index SAMPLE_NAME.sorted.dedup.recal.bam.bai \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 --intervals /raid/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz \
 --variant /raid/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk CalculateContamination \
 -I SAMPLE_NAME.pileups.table \
 -O SAMPLE_NAME.contamination.table \
 --tumor-segmentation SAMPLE_NAME.segments.table \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir .

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseParallelGC'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk LearnReadOrientationModel \
 -I SAMPLE_NAME.f1r2.tar.gz \
 -O SAMPLE_NAME.artifact-priors.tar.gz \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir .

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseParallelGC'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
gatk FilterMutectCalls \
 -R /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 -O SAMPLE_NAME.filtered.vcf \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 --variant SAMPLE_NAME.vcf \
 --stats SAMPLE_NAME.vcf.stats \
 --contamination-table SAMPLE_NAME.contamination.table \
 --tumor-segmentation SAMPLE_NAME.segments.table \
 --orientation-bias-artifact-priors SAMPLE_NAME.artifact-priors.tar.gz \
 --filtering-stats SAMPLE_NAME.filtering.stats

export JAVA_TOOL_OPTIONS='-XX:+UseNUMA -XX:+UseG1GC'
export PATH=/usr/lib/jvm/zulu17/bin:$PATH
export LD_LIBRARY_PATH=zlib-cloudflare/build-avx512
gatk FilterAlignmentArtifacts \
 -R /raid/bundle/b37/human_g1k_v37_decoy.fasta.gz \
 -O output.vcf \
 --java-options "$JAVA_TOOL_OPTIONS" --tmp-dir . \
 -V SAMPLE_NAME.filtered.vcf \
 -I SAMPLE_NAME.sorted.dedup.recal.bam \
 --bwa-mem-index-image /raid/bundle/b37/human_g1k_v37_decoy.fasta.img
bgzip -l 2 SAMPLE_NAME.filtered.vcf
tabix SAMPLE_NAME.filtered.vcf.gz
bgzip -l 2 output.vcf
tabix output.vcf.gz
bcftools annotate -a output.vcf.gz -c FILTER SAMPLE_NAME.filtered.vcf.gz > SAMPLE_NAME.FilterAlignmentArtifacts.vcf

mkdir indel_vcfs_folder
bcftools filter -i 'FILTER="PASS" && TYPE="indel"' \
 --targets-file GRCh37_notinAllTandemRepeatsandHomopolymers_slop5.bed.gz \
 -o indel_vcfs_folder/SAMPLE_NAME.indels.vcf \
 SAMPLE_NAME.FilterAlignmentArtifacts.vcf

hrd-classifier indel_vcfs_folder -p project_name -o output.tsv
ruolin commented 2 months ago

@fo40225 Sorry about the issues! And thank you so much for the feedback and advices! I have fixed the versions of the libraries. I also added an option to use GRCh38 version so that you won't need liftover in the future. Regarding pytorch, I haven't find an easy solution to avoid full installation. I will keep an eye on it.

fo40225 commented 2 months ago

GRCh38 support seems to work. Thank you.

fo40225 commented 2 months ago

Although specifying GRCh38 does not result in an error during execution, the output values are all 0. Please help resolve the issue of that COSMIC ID database dose not having an hg38 version.

ruolin commented 1 month ago

Have you tried installing GRCh38 for COSMIC ? Let me know if the following works or not?

$ python
from SigProfilerMatrixGenerator import install as genInstall
genInstall.install('GRCh38', rsync=False, bash=True)
fo40225 commented 1 month ago

Yes, I have executed the following command, But there is still no COSMIC ID database for GRCh38.

$ wget -c --no-passive-ftp ftp://alexandrovlab-ftp.ucsd.edu/pub/tools/SigProfilerMatrixGenerator/GRCh38.tar.gz
$ python
Python 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from SigProfilerMatrixGenerator import install as genInstall
>>> genInstall.install('GRCh38', offline_files_path='./')
Beginning installation using locally provided files.
The transcriptional reference data for GRCh38 has been saved.
All reference files have been created.
Verifying and benchmarking installation now...
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 6.88 seconds.
Matrices generated for 1 samples with 0 errors. Total of 9631 SNVs, 0 DINUCs, and 0 INDELs were successfully analyzed.
Installation was succesful.
SigProfilerMatrixGenerator took 7.786317825317383 seconds to complete.
To proceed with matrix_generation, please provide the path to your vcf files and an appropriate output path.
Installation complete.
>>> exit()
$ find -name '*ID*'
./lib/python3.10/site-packages/HRD_classifier/data/ID83_model.hrdpos.pickle
./lib/python3.10/site-packages/HRD_classifier/data/ID83_model.hrdneg.pickle
./lib/python3.10/site-packages/SigProfilerAssignment/src/FormatFiles/Sample_Files.ID28.all
./lib/python3.10/site-packages/SigProfilerAssignment/src/FormatFiles/Sample_Files.ID83.all
./lib/python3.10/site-packages/SigProfilerAssignment/src/FormatFiles/Sample_Files.ID415.all
./lib/python3.10/site-packages/SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3.2_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3.1_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerAssignment/data/sigProfiler_ID_signatures.csv
./lib/python3.10/site-packages/babel/locale-data/ms_ID.dat
./lib/python3.10/site-packages/babel/locale-data/su_Latn_ID.dat
./lib/python3.10/site-packages/babel/locale-data/id_ID.dat
./lib/python3.10/site-packages/babel/locale-data/jv_ID.dat
./lib/python3.10/site-packages/SigProfilerExtractor/PlotDecomposition_ID83.py
./lib/python3.10/site-packages/SigProfilerExtractor/data/sigProfiler_ID_signatures.csv
./lib/python3.10/site-packages/SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3.1_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3.2_ID_GRCh37.txt
./lib/python3.10/site-packages/SigProfilerExtractor/__pycache__/PlotDecomposition_ID83.cpython-310.pyc
./lib/python3.10/site-packages/SigProfilerExtractor/src/FormatFiles/Sample_Files.ID83.all
./lib/python3.10/site-packages/SigProfilerExtractor/src/FormatFiles/Sample_Files.ID28.all
./lib/python3.10/site-packages/SigProfilerExtractor/src/FormatFiles/Sample_Files.ID415.all
./include/qt/QtInputSupport/QIntegrityHIDManager
./include/qt/QtCore/Q_PID
./include/LIEF/PE/signature/OIDToString.hpp
./include/LIEF/MachO/UUIDCommand.hpp
./pkgs/liblief-0.12.3-h6a678d5_0/include/LIEF/PE/signature/OIDToString.hpp
./pkgs/liblief-0.12.3-h6a678d5_0/include/LIEF/MachO/UUIDCommand.hpp
./pkgs/qt-main-5.15.2-h327a75a_7/include/qt/QtInputSupport/QIntegrityHIDManager
./pkgs/qt-main-5.15.2-h327a75a_7/include/qt/QtCore/Q_PID
./pkgs/qt-main-5.15.2-h327a75a_7/info/recipe/patches/qt/0001-shobjidl-Fix-compile-guard-around-SHARDAPPIDINFOLINK.patch
./pkgs/babel-2.11.0-py310h06a4308_0/lib/python3.10/site-packages/babel/locale-data/id_ID.dat
./pkgs/babel-2.11.0-py310h06a4308_0/lib/python3.10/site-packages/babel/locale-data/ms_ID.dat
./pkgs/babel-2.11.0-py310h06a4308_0/lib/python3.10/site-packages/babel/locale-data/jv_ID.dat
./pkgs/babel-2.11.0-py310h06a4308_0/lib/python3.10/site-packages/babel/locale-data/su_Latn_ID.dat
./pkgs/daal4py-2023.0.2-py310h3c18c91_0/info/test/examples/notebooks/NYCTaxi-E2E-Demo/NYCTaxi-E2E-RAPIDS.ipynb
ruolin commented 1 month ago

Hi @fo40225, it appears that my test run on GRCh38 was successful.

$ hrd-classifier hg38_test/ -r GRCh38
HRD Classifier is running.
DirectHRD version: 0.1.2
Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.28 seconds.
Starting matrix generation for INDELs...Completed! Elapsed time: 1.99 seconds.
Matrices generated for 1 samples with 0 errors. Total of 1070 SNVs, 9 DINUCs, and 140 INDELs were successfully analyzed.
Result was written to directhrd.results.txt.

Could it be the small difference in the installation led to the problem?

you have genInstall.install('GRCh38', offline_files_path='./') vs genInstall.install('GRCh38', rsync=False, bash=True)

fo40225 commented 1 month ago

I have reinstalled using genInstall.install('GRCh38', rsync=False, bash=True), but there is still no ID database.

lib/python3.10/site-packages$ find -name '*ID*'
./HRD_classifier/data/ID83_model.hrdpos.pickle
./HRD_classifier/data/ID83_model.hrdneg.pickle
./SigProfilerAssignment/src/FormatFiles/Sample_Files.ID28.all
./SigProfilerAssignment/src/FormatFiles/Sample_Files.ID83.all
./SigProfilerAssignment/src/FormatFiles/Sample_Files.ID415.all
./SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3.2_ID_GRCh37.txt
./SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3_ID_GRCh37.txt
./SigProfilerAssignment/data/Reference_Signatures/GRCh37/COSMIC_v3.1_ID_GRCh37.txt
./SigProfilerAssignment/data/sigProfiler_ID_signatures.csv
./babel/locale-data/ms_ID.dat
./babel/locale-data/su_Latn_ID.dat
./babel/locale-data/id_ID.dat
./babel/locale-data/jv_ID.dat
./SigProfilerExtractor/PlotDecomposition_ID83.py
./SigProfilerExtractor/data/sigProfiler_ID_signatures.csv
./SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3_ID_GRCh37.txt
./SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3.1_ID_GRCh37.txt
./SigProfilerExtractor/data/Reference_Signatures/GRCh37/COSMIC_v3.2_ID_GRCh37.txt
./SigProfilerExtractor/__pycache__/PlotDecomposition_ID83.cpython-310.pyc
./SigProfilerExtractor/src/FormatFiles/Sample_Files.ID83.all
./SigProfilerExtractor/src/FormatFiles/Sample_Files.ID28.all
./SigProfilerExtractor/src/FormatFiles/Sample_Files.ID415.all

My issue is that hrd-classifier -r GRCh38 does not produce an error, but the HRD value in .results.txt is 0.

ruolin commented 1 month ago

Hi @fo40225 I also found the there is no ID signature under my installation dir

ls .conda/envs/cds/lib/python3.8/site-packages/SigProfilerAssignment/data/Reference_Signatures/GRCh38/
COSMIC_v1_SBS_GRCh38.txt  COSMIC_v3.1_DBS_GRCh38.txt  COSMIC_v3.2_DBS_GRCh38.txt  COSMIC_v3_DBS_GRCh38.txt
COSMIC_v2_SBS_GRCh38.txt  COSMIC_v3.1_SBS_GRCh38.txt  COSMIC_v3.2_SBS_GRCh38.txt  COSMIC_v3_SBS_GRCh38.txt

My script uses ID signature from GRCh37 folder. But I think the ID signature itself should be reference independent: the signature itself is number of deleted bases and local base context and mostly irrelevant to genomic coordinates. Also, my run using GRCh38 reference produced reasonable results. I got non-zero HRD score and extractions also did not produced any error:

Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 34.2 seconds.
Starting matrix generation for INDELs...Completed! Elapsed time: 2.14 seconds.
Matrices generated for 1 samples with 0 errors. Total of 1070 SNVs, 9 DINUCs, and 140 INDELs were successfully analyzed.

I cannot say for sure it worked because my GRCh38 test case is HRD-negative. However, I wonder if you can share your VCF and I can have a closer look?

fo40225 commented 1 month ago

I have confirmed more samples, and most of the sample scores are 0, while a small number of samples have scores greater than 0. The scores for HRD-positive samples are greater than 60. There should be no issues with the GRCh38.

ruolin commented 1 month ago

I set the precision to be 2 decimal places; if the probability of HRD positive is really low, it might show up as 0. I plan to set up a visualization function to plot the ID signature observed in the sample. If might help as a diagnosis tool. The code is already there. I just need to make an command line interface.