PacificBiosciences / HiFiCNV

Copy number variant caller and depth visualization utility for PacBio HiFi reads
BSD 3-Clause Clear License
35 stars 4 forks source link

ERROR: "Diploid chromosome regex '.' does not match any sample chromosome names..." #24

Closed seetarajpara closed 6 months ago

seetarajpara commented 6 months ago

I set up my test run like this:

REF='~/pacbio/refs/human_GRCh38_no_alt_analysis_set.tar.2023-12-04.gz' \
EXCLUDE='~/pacbio/refs/cnv.excluded_regions.common_50.hg38.bed.gz' \
EXPECTED_CN='~/pacbio/refs/expected_cn.hg38.XY.bed'

hificnv \
    --bam m84039_231121_213913_s2.hifi_reads.bam \
    --ref ${REF} \
    --exclude ${EXCLUDE} \
    --expected-cn ${EXPECTED_CN} \
    --threads 16 \
    --output-prefix hificnv \
    --cov-regex "."

and I keep getting this error:

[2024-03-04][14:33:15][hificnv][INFO] Starting hificnv
[2024-03-04][14:33:15][hificnv][INFO] cmdline: hificnv --bam m84039_231121_213913_s2.hifi_reads.bam --ref /scratch1/seetaraj/pacbio/refs/human_GRCh38_no_alt_analysis_set.tar.2023-12-04.gz --exclude /scratch1/seetaraj/pacbio/refs/cnv.excluded_regions.common_50.hg38.bed.gz --expected-cn /scratch1/seetaraj/pacbio/refs/expected_cn.hg38.XY.bed --threads 8 --output-prefix hificnv --cov-regex .
[2024-03-04][14:33:15][hificnv][INFO] Running on 8 threads
thread 'main' panicked at 'Diploid chromosome regex '.' does not match any sample chromosome names, use '--cov-regex "."' to match all available chromosomes.', src/cli.rs:234:5

Please let me know what I need to do here, the BAM file I received has been sorted by coordinate, looks like it went through the ccs, pbtrim, and jasmine commands prior to my obtaining the files. I have a viable index, both in .bai and .pbi form. I'm not sure what could possibly be the issue, any advice would be greatly appreciated!

ctsa commented 6 months ago

Can you list the reference contigs from the input BAM file? For instance, the output from the following:

samtools idxstats m84039_231121_213913_s2.hifi_reads.bam
seetarajpara commented 6 months ago

ah so when I ran that, I got this:

$ samtools idxstats m84039_231121_213913_s2.hifi_reads.bam
*   0   0   5079255

what should I expect to see in these columns?

EDIT to add: I looked through the headers of these BAMs I received, and it said they were sorted by coordinates, AND they had already been through ccs, pbtrim, and jasmine, so I assumed these BAM files were aligned. Does this mean I need to run alignment on them first? I apologize, I'm not used to long read data, and am mostly familiar with post-run processing for short read outputs. I appreciate any advice on this!

ctsa commented 6 months ago

Hi @seetarajpara , I think you are right, the issue appears to be that these BAM files are unmapped. The recommended read mapper for HiFiCNV is pbmm2. We can add an improved error message to HiFiCNV to clarify this case.

seetarajpara commented 6 months ago

thank you @ctsa for your prompt responses! I've been running the alignment and sorting, I think by that point I should be ready to run HiFiCNV and other downstream analyses. I appreciate your help!