DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
127 stars 41 forks source link

Error: Have found no compatible reference specifications #61

Open santhoshnh opened 3 years ago

santhoshnh commented 3 years ago

Hii, I installed HLA-LA through conda. I have downloaded graph and indexed, And while running the HLA-LA.pl I am getting below error

Have found no compatible reference specifications in /home/admin/anaconda3/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences - create a file for this BAM file and try again. at /home/admin/anaconda3/bin/HLA-LA.pl line 309

Below I am attaching samtools idxstats results for my BAM file Kindly help me to resolve the issue samtools_idsstats.txt

TonyLupara commented 3 years ago

Hi, I've corrected your file to use as extraction specifications by HLA-LA. It contains information about what reads to extract from your BAM file to use in genotyping. I marked chr6 MHC region and all unmapped reads.

Add this file into graphs/PRG_MHC_GRCh38_withIMGT/knownReferences folder.

reference_extraction.txt

Let me know if it work.

santhoshnh commented 3 years ago

Hii I am getting following error after adding reference_extraction.txt into graphs/PRG_MHC_GRCh38_withIMGT/knownReferences folder

Incorrect header for /home/admin/anaconda3/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences/reference_extraction.txt at /home/admin/anaconda3/bin/HLA-LA.pl line 255, line 1

TonyLupara commented 3 years ago

Use this one, it should work

reference_extraction_UNIX.txt

santhoshnh commented 3 years ago

Thank you. It is working. Can I know what is the difference between this file and previous one?? So that if I use different genome I can make it by my own

TonyLupara commented 3 years ago

The main idea is that when you edit text file in Windows it uses CR LF (Windows) line break type, I have changed it with Notepad++ to LF (Unix) break type.

santhoshnh commented 3 years ago

Thank you. Can I use the bam file which is obtained by mapping fasta reads to reference genome??

TonyLupara commented 3 years ago

You can, just make samtools idxstats results for your BAM file, extract contigID and Length, make proper extraction file with Excel for example, and switch to Unix line break type.

chenxf611 commented 3 years ago

Hi, I mapped my pacbio HLA reads to hg38 and then run HLA-LA.pl with --longReads pacbio tag, I got same error:

Have found no compatible reference specifications in ./graphs/PRG_MHC_GRCh38_withIMGT/knownReferences - create a file for this BAM file and try again.

My question is, which reference should I use to map my reads in the first place, regular hg38 or specified HLA reference?

Thanks

Jack

TonyLupara commented 3 years ago

Use whatever reference you think is correct in this situation. For example I use GRCh38_full_analysis_set_plus_decoy_hla.fa for my processing, as I believe it increase mapQ quality for reads aligned to alternative contigs. Notice that reads should be mapped to reference with alt-contigs with alt-aware mapper (for example, bwa-mem) and ideally with Postprocessing. Otherwise read maps to multiple locations will have zero mapQ. More details here: https://github.com/lh3/bwa/blob/master/README-alt.md https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use

If you don't know how to use alt-aware mapper its better to work with hg38_analysis_set without alt-contigs. For example: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz

AndresCongenica commented 1 year ago

Hi, I am trying to evaluate different tools and have chosen to use NF-Core's reference file for cutting down to relevant regions. I am facing the same issue as the rest in the thread: "Have found no compatible reference specifications in..."

Kindly see my idxstats file attached and please let me know of anything else you may require. idx_stats.txt

AlexanderDilthey commented 1 year ago

Hi all,

If you have freedom to choose the reference you map against, I would recommend using the standard 1000 Genomes reference: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

@AndresCongenica, it looks like all of the contigs in your file contain a "HLA" substring, which may indicate that all reads mapping to any of these should be extracted. However, I am not sure where any such BAM would come from - would it be based on extracting alignment records from a BAM that is based on mapping against a whole-genome reference (which would probably be fine), or based on mapping a set of whole-genome reads against a reference containing only the HLA reference contigs (which may be a process prone to attracting false-positive alignments)?

Best wishes

Alex