PacificBiosciences / paraphase

HiFi-based caller for highly similar paralogous genes
BSD 3-Clause Clear License
23 stars 4 forks source link

Error about bam #12

Closed crazysummerW closed 6 months ago

crazysummerW commented 7 months ago

hi, @xiao-chen-xc I used PacBio Revio's HiFi data for testing. The input file is m84011_220902_175841_s1.hifi_reads.bam (HG002), downloaded from the official PacBio link. However, I encountered the following error. Could you please help me identify the issue?

INFO:root:Processing sample m84011_220902_175841_s1 at 2023-12-05 03:45:42.720319...
INFO:root:Getting genome depth for sample m84011_220902_175841_s1 at 2023-12-05 03:45:42.720813...
ERROR:root:Error running sample m84011_220902_175841_s1...See error message below
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/paraphase/paraphase.py", line 229, in process_sample
    depth = GenomeDepth(
  File "/usr/local/lib/python3.8/dist-packages/paraphase/genome_depth.py", line 13, in __init__
    self._bamh = pysam.AlignmentFile(bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 748, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 997, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

Looking forward to your reply. Thank you.

xiao-chen-xc commented 7 months ago

Hi @crazysummerW could you share the link you used to download your input bam? Based on the error message it doesn't look like an aligned bam file.

xiao-chen-xc commented 7 months ago

Did you use the data downloaded from here? https://downloads.pacbcloud.com/public/revio/2022Q4/HG002-rep1/ There is a hg38 aligned bam file in the analysis folder.

Did you take the reads and align to hs37d5 by yourself? What is the average coverage of your bam? Perhaps there was an error in the alignment process? You could load the bam into IGV and check coverage for a few random regions. For example, could you check PMS2 in IGV and take a screenshot?

Could you also share the command you used to run Paraphase?

crazysummerW commented 7 months ago

hi @xiao-chen-xc

Did you use the data downloaded from here?

Yes.

There is a hg38 aligned bam file in the analysis folder.

We prefer to use our own analyzed data. We are more focused on hs37d5.

Did you take the reads and align to hs37d5 by yourself?

Yes.

What is the average coverage of your bam?

32.5

Perhaps there was an error in the alignment process?

No error message was received during alignment(pbmm2). The aligned BAM file was used for other analyses such as PBSV without any issues.

You could load the bam into IGV and check coverage for a few random regions. For example, could you check PMS2 in IGV and take a screenshot?

image

Could you also share the command you used to run Paraphase?

paraphase -b hifi_reads_aligned.bam -o ./ -r hs37d5.fasta --genome 37 -t 32

And, I found that I did not obtain any analytical results. I don't know where the problem is. m84011_220902_175841_s1.json

xiao-chen-xc commented 7 months ago

It'd be great if I could take a look at your bam file. Could you extract a bamlet and share with me? You can use this coordinate: 6:31947777-32014577

xiao-chen-xc commented 7 months ago

Paraphase expects chromosome names in 1, 2, 3, etc. for GRCh37, and chr1, chr2, chr3 for hg19. If you try --genome 19 instead, it should solve the problem.

crazysummerW commented 6 months ago

@xiao-chen-xc It worked. Thank you for your reply.