DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
124 stars 42 forks source link

Question about perfectG value and running on HPC cluster #26

Closed jbreynier closed 5 years ago

jbreynier commented 5 years ago

Hello,

Thank you very much for providing this tool. I was able to run the program successfully on my samples, but I am confused about the perfectG value in the output. In the README doc, it is written: perfectG: Perfect translation from HLA*PRG:LA call to G grop resolution? (1 = Yes) This would imply that perfectG = 1 indicates no error, but then it is also written: _If perfectG != 0, you might want to check ../working/$mySampleID/hla/R1bestguess.txt, which contains the un-translated output from the internal model of the algorithm. Is this maybe a typo? And how does the R1_bestguess.txt file help in the case where the translation is not perfect?

In addition, I just wanted to let you know that this tool isn't fully compatible with a cluster environment. HLA-LA requires indexing of the graph, but since most users on a cluster can't modify any program files, this can cause some issues. I was able to circumvent this problem by having the HPC cluster administrator grant me write permission to the the graphs directory, but it would be great if the graph directory wasn't hardcoded in the script. Thank you in advance for your help!

AlexanderDilthey commented 5 years ago

Hello,

You are absolutely right - it should read "If perfectG != 1, you might want..." - thank you for spotting this, I have now fixed this!

In some instances, HLA-LA uses internal allele groupings that are slightly different from the G group scheme - in this case, the R1_bestguess.txt can tell you the exact set of alleles that was inferred. Even then, however, the algorithm will try to carry out a sensible translation into the G group scheme.

Thank you for your feedback on this! The advantage of the current design is that the cluster admin can index the graph once, and then the algorithm is fully ready for use by all cluster users - but I understand that e.g. in your case this might not be optimal! We'll keep this in mind for future versions of the algorithm!