DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
124 stars 42 forks source link

Why my NA12878 test result is not same with NA12878_example_output_G.txt? #32

Closed PavitaKae closed 3 years ago

PavitaKae commented 4 years ago

This is my command. ~/HLA-LA/src/HLA-LA.pl --BAM NA12878.mini.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878 --maxThreads 40

This is my test result. R1_bestguess_G.txt

AlexanderDilthey commented 4 years ago

Hi @PavitaKae, very difficult to tell - looking at the output file you provided, coverage on the class I genes (HLA-A, -B, -C) is very low. This would indicate that either the test file is corrupted, or that something with the read extraction process has gone wrong. Did you modify the reference extraction files in any way? Could you send an md5 of NA12878.mini.cram? And could you capture all of STDOUT and STDERR and post it here?

PavitaKae commented 4 years ago

This is my MD5sum for NA12878 file -> 45d1769ffed71418571c9a2414465a12 I didn't modify your reference graph, just download and make graph by following manual. I attach file for .out and .err.

41436.out.txt 41436.err.txt

AlexanderDilthey commented 4 years ago

There is some issue with read extraction - in your output log, it says processBAM::extractSeeds(): getReadIDs 833136 reads, collected 402762 read IDs., whereas it should say processBAM::extractSeeds(): getReadIDs 13649900 reads, collected 1373415 read IDs..

In your error log, there is a message from Picard: To execute picard run: java -jar $EBROOTPICARD/picard.jar (also, there are some warning messages about the locale that come from Perl, but I don't think these matter too much).

If you go into the working directory for the sample (e.g. HLA-LA/working/NA12878_mini), R_1.fastq and R_2.fastq should both be about 500Mb in size (I would expect them to be smaller on your system), and extraction.bam should be a little bit larger than 310Mb (I would expect this to be the case on your system).

I think that there is some issue with Picard - if you execute the extraction command, i.e. /tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/noon/bin/picard SamToFastq VALIDATION_STRINGENCY=LENIENT I=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/extraction.bam F=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_1.fastq F2=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_2.fastq FU=/tarafs/biobank/data/home/pkaewpro/proj0015/HLA-LA/working/NA_test/R_U.fastq 2>&1, manually, do you get an error message?

PavitaKae commented 3 years ago

Hi, AlexanderDilthey I back to run again, it look good. Because i choose to install new picard program. Thank you for your response. :D