PacificBiosciences / paraphase

HiFi-based caller for highly similar paralogous genes
BSD 3-Clause Clear License
29 stars 4 forks source link

Error index f8 #8

Open bioinfogit opened 1 year ago

bioinfogit commented 1 year ago

Hi I am getting following error pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[faidx] Could not build fai index ../path/tmp/f8_ref.fa.fai\n' removing f8 from gene list works and I am using the latest version paraphase --version 2.2.3

xiao-chen-xc commented 1 year ago

Hi @bioinfogit I'm not able to reproduce this error. Could you delete that tmp folder and try again?

themkdemiiir commented 10 months ago

Hello, the version is 2.2.3 docker image quay.io/pacbio/paraphase:2.2.3_build2

I received an error message when I executed the command in the Docker environment. However, after checking the outdir, I could not locate the tmp file.

root@a9f9e6fb2c28:/# paraphase --threads 8 --bam /longread/NA17282.HomoSapiens.aligned.haplotagged.bam -o /longread/ --reference /genomes/Homo_sapiens.GRCh38.dna.primary_assembly.fa              
ERROR:root:Error running the program...See error message below
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/paraphase/paraphase.py", line 472, in run
    configs = self.update_config(gene_list, tmpdir, args.reference)
  File "/usr/local/lib/python3.8/dist-packages/paraphase/paraphase.py", line 325, in update_config
    self.make_ref_fasta(ref_file, realign_region, genome)
  File "/usr/local/lib/python3.8/dist-packages/paraphase/paraphase.py", line 352, in make_ref_fasta
    pysam.faidx(ref_file)
  File "/usr/local/lib/python3.8/dist-packages/pysam/utils.py", line 83, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[faidx] Could not build fai index /longread/tmp_2023-11-22-12-15-40-154747/smn1_ref.fa.fai\n'
INFO:root:Completed Paraphase analysis at 2023-11-22 12:15:40.277212...
themkdemiiir commented 10 months ago

I now understand the issue. My ensembl reference file lacks "chr" string. Could you fix the problem here?

Error

kaan@biyoinfo1:~$ samtools faidx levopt/hg38/genomes/ensembl_p13_primary/Homo_sapiens.GRCh38.dna.primary_assembly.fa chr5:70890000-71100000 | sed -e "s/-/_/" | sed -e "s/:/_/" > kaan.txt
[W::fai_get_val] Reference chr5:70890000-71100000 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in chr5:70890000-71100000
kaan@biyoinfo1:~$ samtools faidx kaan.txt
[faidx] Could not build fai index kaan.txt.fai

No Error

kaan@biyoinfo1:~$ samtools faidx levopt/hg38/genomes/ensembl_p13_primary/Homo_sapiens.GRCh38.dna.primary_assembly.fa 5:70890000-71100000 | sed -e "s/-/_/" | sed -e "s/:/_/" > kaan.txt
kaan@biyoinfo1:~$ samtools faidx kaan.txt
xiao-chen-xc commented 10 months ago

Hi @themkdemiiir, Paraphase assumes GRCh38 has "chr" in chromosome names. Could you realign to the UCSC/NCBI version and rerun Paraphase? For best performance with HiFi data, please remove ALT contigs from the reference genome before alignment. We do have a recommended version of reference genome (with download links) documented here.