DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
120 stars 40 forks source link

A problem about reference? #27

Closed maolingfengZJU closed 4 years ago

maolingfengZJU commented 4 years ago

I downloaded the reference GRCh38 from Ensemble and build the index with bowtie2, but the problem happened like below, anyone can give me some advice? Have found no compatible reference specifications in /public/home/fanlj/mlf/HLA/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences - create a file for this BAM file and try again. at ../../HLA-LA/src/HLA-LA.pl line 315

miko-798 commented 4 years ago

I have the same problem, I specified my reference genome (also GRCh38) using --samtools_T, but I got the same error.

AlexanderDilthey commented 4 years ago

Could you post the output of samtools idxstats for your BAM?

miko-798 commented 4 years ago

Yes, I attached the idxstats for my BAM here. I created a graph myself (and placed it under the knownReferences directory) but I got an error like this when running HLA*LA: Graph directory ../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences does not seem to be complete - does this directory specify a valid graph for HLA-LA? at src/HLA-LA.pl line 203.

As a background, I included HLA sequences (alternative contigs) in my reference genome, and so the BAM have those too.

I would really appreciate it if you could help me create a graph, and point out what is wrong with the one I created. Thanks a lot!

4084B_idxstats.txt ERCC_HLA_graph.txt

AlexanderDilthey commented 4 years ago

Hi @miko-798, if you say you created a graph yourself, what exactly do you mean by that? I assume you mean you have a custom reference file for your BAMs? The reference file you attached looks OK. What value do you use for the --graph parameter?

miko-798 commented 4 years ago

Hi @AlexanderDilthey, thanks for your reply. The file "ERCC_HLA_graph.txt" I attached earlier is the graph I created. I did have a custom reference file in fasta format when I generated the BAMs (GRCh38.p12.genome.plus.ERCC.HLA.fa, which include extra ERCC contigs, as well as HLA sequences as alternative loci). I am pretty sure the graph I created contains all the contigs in the BAM.

For --graph, used the same directory as before: --graph PRG_MHC_GRCh38_withIMGT, and I tried putting the file "ERCC_HLA_graph.txt" under the same directory, as well as under knownReferences directory. But I got the error as I wrote earlier.

Should I index the graph (I downloaded another copy of the data package, copied my graph there and tried indexing, but got another error, screenshot below)? Or what is a good way to solve this?

Thanks a lot for your help.

Screen Shot 2019-11-13 at 11 15 57 PM

AlexanderDilthey commented 4 years ago

Hi @miko-798, OK - I think I understand what's going on! What you need is not a new graph, but merely a new reference extraction file. Here is what should work:

  1. Restore graphs/PRG_MHC_GRCh38_withIMGT to its original state, e.g. by re-downloading the data package and indexing the graph.
  2. Put the file you created into src/additionalReferences/PRG_MHC_GRCh38_withIMGT.
  3. Hopefully done :-)
miko-798 commented 4 years ago

Hi @AlexanderDilthey,
Thanks a lot. It worked!

miko-798 commented 4 years ago

Actually I still have a question about the graph. I wonder how do I know which graph the tool is using. Can I get that information from the log file? After I put the file I created under the directory you specified src/additionalReferences/PRG_MHC_GRCh38_withIMGT, and run HLA*LA, I got this in the logs: Graph serialization existing and newer than graph file; read from /home/mikoliu798/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH.

How do I make sure the tool is actually using the graph I created? I also attached the complete log here. 4084B_modified_graph.log

Thanks so much for your help!

AlexanderDilthey commented 4 years ago

There is currently only one graph, PRG_MHC_GRCh38_withIMGT. I think you refer to the reference extraction file, right? The tool will complain in case it finds no suitable file or more than one; i.e. if it produces output, you can be certain it used your file.

lidd77 commented 1 month ago

Yes, I attached the idxstats for my BAM here. I created a graph myself (and placed it under the knownReferences directory) but I got an error like this when running HLA*LA: Graph directory ../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences does not seem to be complete - does this directory specify a valid graph for HLA-LA? at src/HLA-LA.pl line 203.

As a background, I included HLA sequences (alternative contigs) in my reference genome, and so the BAM have those too.

I would really appreciate it if you could help me create a graph, and point out what is wrong with the one I created. Thanks a lot!

4084B_idxstats.txt ERCC_HLA_graph.txt

hello, how to use specific IMGT version to build MHC_GRCh38_withIMGT ??