RabadanLab / arcasHLA

Fast and accurate in silico inference of HLA genotypes from RNA-seq
GNU General Public License v3.0
114 stars 49 forks source link

Which reference sequence do you recommend? #62

Open slowkow opened 3 years ago

slowkow commented 3 years ago

Could I please ask if you might be willing to discuss a few questions?

  1. Do you have any advice for users regarding which reference genome sequence we should use?
  2. Should we exclude the non-chromosome sequences prior to read-mapping? (Considering that your code extracts reads from one region of chr6 in each BAM file, I think the answer is "yes.")

I tried two different options.

Run 1: GENCODE

I downloaded this sequence:

ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_37/GRCh38.p13.genome.fa.gz

Then, I mapped reads with STAR and ran arcasHLA on the BAMs.

Run 2: Filtered UCSC

I downloaded this sequence:

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

First, I discarded all sequences from the fasta file with names that are not in the list chr1-23,X,Y.

Then, I mapped reads with STAR and ran arcasHLA on the BAMs.

Results

arcasHLA called a greater number of genotypes in the UCSC run than the GENCODE run. I don't know what the true genotypes might be. Many of the genotypes do not match between the two runs.

I think the main reason for the difference between the two runs is whether or not we include the "alternate" or "patch" or "scaffold" sequences. For the GENCODE run, the alternate sequences were included in the read-mapping step. For the UCSC run, the alternate sequences were not included.

I haven't tested to see if the chromosome sequences (chr1-23,X,Y) are identical between GENCODE and UCSC, but I might guess that they are very similar or identical.

IoanFilip2 commented 3 years ago

Thanks for your interest in our tool and for your constructive feedback today. Regarding your specific questions here:

  1. arcasHLA works well with all reference genome builds and annotations, and there isn't really a preference or suggested usage from our experience; in our manuscript, we make a point specifying that ALT sequences and HLA decoys are dealt with in a satisfactory manner if included in the genome annotation (namely, the ALT sequences are checked by default and those reads which map to such ALT sequences are not discarded from the extraction step for the purpose of HLA genotyping);
  2. it is likely better not to exclude any non-chromosome sequences or to alter the genomic reference in any way prior to read mapping; the mapping output from STAR (which is the input bam for arcasHLA) is dealt with accordingly at the extraction step; we also recommend using the --unmapped flag to increase the number of reads considered at the pseudo-alignment and genotyping steps.

How different are the genotypes called with arcasHLA between your runs? Is there a big discrepancy in every sample or in some samples only? Is there a big discrepancy in the first two fields or only in the third (or fourth) field? Are you genotyping class I, class II or both?

slowkow commented 3 years ago

Now that I've read some of the code in scripts/extract.py, I can see why I am getting more genotypes called when I discard the alt sequences.

Here are all of the sequence names from the GENCODE file:

$ grep '^>' GRCh38.p13.genome.fa
>chr1 1
>chr2 2
>chr3 3
>chr4 4
>chr5 5
>chr6 6
>chr7 7
>chr8 8
>chr9 9
>chr10 10
>chr11 11
>chr12 12
>chr13 13
>chr14 14
>chr15 15
>chr16 16
>chr17 17
>chr18 18
>chr19 19
>chr20 20
>chr21 21
>chr22 22
>chrX X
>chrY Y
>chrM MT
>GL000008.2 GL000008.2
>GL000009.2 GL000009.2
>GL000194.1 GL000194.1
>GL000195.1 GL000195.1
>GL000205.2 GL000205.2
>GL000208.1 GL000208.1
>GL000213.1 GL000213.1
>GL000214.1 GL000214.1
>GL000216.2 GL000216.2
>GL000218.1 GL000218.1
>GL000219.1 GL000219.1
>GL000220.1 GL000220.1
>GL000221.1 GL000221.1
>GL000224.1 GL000224.1
>GL000225.1 GL000225.1
>GL000226.1 GL000226.1
>KQ759759.1 HG107_PATCH
>ML143376.1 HG109_PATCH
>KN538364.1 HG126_PATCH
>ML143355.1 HG1277_PATCH
>ML143348.1 HG1296_PATCH
>ML143347.1 HG1298_PATCH
>ML143346.1 HG1299_PATCH
>ML143352.1 HG1309_PATCH
>KQ759762.1 HG1311_PATCH
>ML143375.1 HG1320_PATCH
>KQ031383.1 HG1342_HG2282_PATCH
>KN538369.1 HG1362_PATCH
>ML143342.1 HG1384_PATCH
>ML143350.1 HG1395_PATCH
>ML143362.1 HG1398_PATCH
>JH159136.1 HG142_HG150_NOVEL_TEST
>ML143357.1 HG1445_PATCH
>ML143385.1 HG1466_PATCH
>ML143378.1 HG1485_PATCH
>ML143382.1 HG1506_PATCH
>ML143383.1 HG1507_PATCH
>ML143384.1 HG1509_PATCH
>JH159137.1 HG151_NOVEL_TEST
>ML143356.1 HG1521_PATCH
>ML143364.1 HG1523_PATCH
>ML143365.1 HG1524_PATCH
>KZ208923.1 HG1531_PATCH
>KZ208924.1 HG1535_PATCH
>KQ031387.1 HG1651_PATCH
>KV766195.1 HG1708_PATCH
>KZ208916.1 HG1815_PATCH
>ML143363.1 HG1817_1_PATCH
>KN538360.1 HG1832_PATCH
>KZ208920.1 HG1_PATCH
>KZ208906.1 HG2002_PATCH
>KN196484.1 HG2021_PATCH
>KN196476.1 HG2022_PATCH
>KQ983257.1 HG2023_PATCH
>KN196479.1 HG2030_PATCH
>KV575245.1 HG2046_PATCH
>KZ208917.1 HG2047_PATCH
>KZ208911.1 HG2057_PATCH
>KN196473.1 HG2058_PATCH
>KZ559108.1 HG2060_PATCH
>KN196487.1 HG2062_PATCH
>KQ759760.1 HG2063_PATCH
>KN196475.1 HG2066_PATCH
>KV880766.1 HG2067_PATCH
>KV880767.1 HG2068_PATCH
>KQ090016.1 HG2072_PATCH
>ML143374.1 HG2087_PATCH
>KV880764.1 HG2088_PATCH
>KN538361.1 HG2095_PATCH
>KN196474.1 HG2104_PATCH
>ML143360.1 HG2111_PATCH
>KZ559109.1 HG2114_PATCH
>ML143359.1 HG2115_PATCH
>KQ090022.1 HG2116_PATCH
>KV766194.1 HG2121_PATCH
>KN196478.1 HG2128_PATCH
>KZ559104.1 HG2133_PATCH
>KN196480.1 HG2191_PATCH
>ML143370.1 HG2198_PATCH
>KQ090028.1 HG2213_PATCH
>KN196483.1 HG2216_PATCH
>KN196481.1 HG2217_PATCH
>KN538363.1 HG2232_PATCH
>KN538362.1 HG2233_PATCH
>KQ031385.1 HG2235_PATCH
>KV766192.1 HG2236_PATCH
>KQ031386.1 HG2237_PATCH
>KQ031388.1 HG2239_PATCH
>KN538365.1 HG2241_PATCH
>KN538366.1 HG2242_HG2243_PATCH
>KN538367.1 HG2244_HG2245_PATCH
>ML143361.1 HG2246_HG2248_HG2276_PATCH
>KN538370.1 HG2247_PATCH
>KN538373.1 HG2249_PATCH
>KZ559113.1 HG2263_PATCH
>KV880765.1 HG2266_PATCH
>KV766196.1 HG2285_HG106_HG2252_PATCH
>KN538371.1 HG2288_HG2289_PATCH
>KQ031384.1 HG2290_PATCH
>KN538372.1 HG2291_PATCH
>KQ090021.1 HG2334_PATCH
>ML143371.1 HG2365_PATCH
>KN196482.1 HG23_PATCH
>KZ559115.1 HG2412_PATCH
>KZ208914.1 HG2419_PATCH
>KZ208922.1 HG2442_PATCH
>ML143373.1 HG2471_PATCH
>ML143369.1 HG2499_PATCH
>ML143366.1 HG2509_PATCH
>ML143367.1 HG2510_PATCH
>ML143372.1 HG2511_PATCH
>ML143380.1 HG2512_PATCH
>ML143377.1 HG2513_PATCH
>ML143345.1 HG2525_PATCH
>KQ458386.1 HG26_PATCH
>ML143358.1 HG28_PATCH
>KV575244.1 HG30_PATCH
>ML143381.1 HG439_PATCH
>KZ559100.1 HG460_PATCH
>ML143379.1 HG494_PATCH
>ML143354.1 HG545_PATCH
>ML143351.1 HG563_PATCH
>ML143353.1 HG613_PATCH
>ML143344.1 HG699_PATCH
>ML143349.1 HG705_PATCH
>KZ208912.1 HG708_PATCH
>ML143341.1 HG721_PATCH
>KZ208915.1 HG76_PATCH
>KV880768.1 HG926_PATCH
>KN196472.1 HG986_PATCH
>GL383545.1 HSCHR10_1_CTG1
>GL383546.1 HSCHR10_1_CTG2
>KI270824.1 HSCHR10_1_CTG3
>KI270825.1 HSCHR10_1_CTG4
>KQ090020.1 HSCHR10_1_CTG6
>GL383547.1 HSCHR11_1_CTG1_1
>KN538368.1 HSCHR11_1_CTG1_2
>KI270826.1 HSCHR11_1_CTG2
>KI270827.1 HSCHR11_1_CTG3
>KZ559111.1 HSCHR11_1_CTG3_1
>KI270829.1 HSCHR11_1_CTG5
>KI270830.1 HSCHR11_1_CTG6
>KI270831.1 HSCHR11_1_CTG7
>KI270832.1 HSCHR11_1_CTG8
>KI270902.1 HSCHR11_2_CTG1
>KI270903.1 HSCHR11_2_CTG1_1
>KZ559110.1 HSCHR11_2_CTG8
>KI270927.1 HSCHR11_3_CTG1
>GL877875.1 HSCHR12_1_CTG1
>GL383549.1 HSCHR12_1_CTG2
>GL383550.2 HSCHR12_1_CTG2_1
>KQ090023.1 HSCHR12_2_CTG1
>GL877876.1 HSCHR12_2_CTG2
>GL383552.1 HSCHR12_2_CTG2_1
>KI270904.1 HSCHR12_3_CTG2
>GL383553.2 HSCHR12_3_CTG2_1
>KI270835.1 HSCHR12_4_CTG2
>GL383551.1 HSCHR12_4_CTG2_1
>KI270837.1 HSCHR12_5_CTG2
>KI270833.1 HSCHR12_5_CTG2_1
>KI270834.1 HSCHR12_6_CTG2_1
>KI270836.1 HSCHR12_7_CTG2_1
>KZ208918.1 HSCHR12_8_CTG2_1
>KZ559112.1 HSCHR12_9_CTG2_1
>KI270838.1 HSCHR13_1_CTG1
>KI270839.1 HSCHR13_1_CTG2
>KI270840.1 HSCHR13_1_CTG3
>KI270841.1 HSCHR13_1_CTG4
>KI270842.1 HSCHR13_1_CTG5
>KI270843.1 HSCHR13_1_CTG6
>KQ090024.1 HSCHR13_1_CTG7
>KQ090025.1 HSCHR13_1_CTG8
>KI270844.1 HSCHR14_1_CTG1
>KI270845.1 HSCHR14_2_CTG1
>KI270846.1 HSCHR14_3_CTG1
>KI270847.1 HSCHR14_7_CTG1
>KZ208919.1 HSCHR14_8_CTG1
>ML143368.1 HSCHR14_9_CTG1
>KI270852.1 HSCHR15_1_CTG1
>KI270848.1 HSCHR15_1_CTG3
>GL383554.1 HSCHR15_1_CTG8
>KI270906.1 HSCHR15_2_CTG3
>GL383555.2 HSCHR15_2_CTG8
>KI270851.1 HSCHR15_3_CTG3
>KI270849.1 HSCHR15_3_CTG8
>KI270905.1 HSCHR15_4_CTG8
>KI270850.1 HSCHR15_5_CTG8
>KQ031389.1 HSCHR15_6_CTG8
>KI270853.1 HSCHR16_1_CTG1
>GL383556.1 HSCHR16_1_CTG3_1
>GL383557.1 HSCHR16_2_CTG3_1
>KI270855.1 HSCHR16_3_CTG1
>KQ031390.1 HSCHR16_3_CTG3_1
>KI270856.1 HSCHR16_4_CTG1
>KQ090027.1 HSCHR16_4_CTG3_1
>KQ090026.1 HSCHR16_5_CTG1
>KZ208921.1 HSCHR16_5_CTG3_1
>KI270854.1 HSCHR16_CTG2
>KI270909.1 HSCHR17_10_CTG4
>KV766197.1 HSCHR17_11_CTG4
>KZ559114.1 HSCHR17_12_CTG4
>GL383563.3 HSCHR17_1_CTG1
>KI270861.1 HSCHR17_1_CTG2
>GL383564.2 HSCHR17_1_CTG4
>GL000258.2 HSCHR17_1_CTG5
>KI270860.1 HSCHR17_1_CTG9
>KI270907.1 HSCHR17_2_CTG1
>KI270862.1 HSCHR17_2_CTG2
>GL383565.1 HSCHR17_2_CTG4
>KI270908.1 HSCHR17_2_CTG5
>KV766198.1 HSCHR17_3_CTG1
>KI270910.1 HSCHR17_3_CTG2
>GL383566.1 HSCHR17_3_CTG4
>JH159146.1 HSCHR17_4_CTG4
>JH159147.1 HSCHR17_5_CTG4
>JH159148.1 HSCHR17_6_CTG4
>KI270857.1 HSCHR17_7_CTG4
>KI270858.1 HSCHR17_8_CTG4
>KI270859.1 HSCHR17_9_CTG4
>KZ559116.1 HSCHR18_1_CTG1
>GL383567.1 HSCHR18_1_CTG1_1
>GL383568.1 HSCHR18_1_CTG2
>GL383569.1 HSCHR18_1_CTG2_1
>GL383570.1 HSCHR18_2_CTG1_1
>GL383571.1 HSCHR18_2_CTG2
>GL383572.1 HSCHR18_2_CTG2_1
>KI270863.1 HSCHR18_3_CTG2_1
>KI270864.1 HSCHR18_4_CTG1_1
>KQ458385.1 HSCHR18_5_CTG1_1
>KI270912.1 HSCHR18_ALT21_CTG2_1
>KI270911.1 HSCHR18_ALT2_CTG2_1
>KV575254.1 HSCHR19KIR_0010-5217-AB_CTG3_1
>KV575246.1 HSCHR19KIR_0019-4656-A_CTG3_1
>KV575256.1 HSCHR19KIR_0019-4656-B_CTG3_1
>KV575253.1 HSCHR19KIR_502960008-1_CTG3_1
>KV575252.1 HSCHR19KIR_502960008-2_CTG3_1
>KV575255.1 HSCHR19KIR_7191059-1_CTG3_1
>KV575259.1 HSCHR19KIR_7191059-2_CTG3_1
>KI270917.1 HSCHR19KIR_ABC08_A1_HAP_CTG3_1
>KI270918.1 HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1
>KI270919.1 HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1
>KV575247.1 HSCHR19KIR_CA01-TA01_1_CTG3_1
>KV575248.1 HSCHR19KIR_CA01-TA01_2_CTG3_1
>KV575250.1 HSCHR19KIR_CA01-TB01_CTG3_1
>KV575249.1 HSCHR19KIR_CA01-TB04_CTG3_1
>KV575257.1 HSCHR19KIR_CA04_CTG3_1
>KI270920.1 HSCHR19KIR_FH05_A_HAP_CTG3_1
>KI270921.1 HSCHR19KIR_FH05_B_HAP_CTG3_1
>KI270922.1 HSCHR19KIR_FH06_A_HAP_CTG3_1
>KI270923.1 HSCHR19KIR_FH06_BA1_HAP_CTG3_1
>KI270929.1 HSCHR19KIR_FH08_A_HAP_CTG3_1
>KI270930.1 HSCHR19KIR_FH08_BAX_HAP_CTG3_1
>KI270931.1 HSCHR19KIR_FH13_A_HAP_CTG3_1
>KI270932.1 HSCHR19KIR_FH13_BA2_HAP_CTG3_1
>KI270933.1 HSCHR19KIR_FH15_A_HAP_CTG3_1
>KI270882.1 HSCHR19KIR_FH15_B_HAP_CTG3_1
>KI270883.1 HSCHR19KIR_G085_A_HAP_CTG3_1
>KI270884.1 HSCHR19KIR_G085_BA1_HAP_CTG3_1
>KI270885.1 HSCHR19KIR_G248_A_HAP_CTG3_1
>KI270886.1 HSCHR19KIR_G248_BA2_HAP_CTG3_1
>KI270887.1 HSCHR19KIR_GRC212_AB_HAP_CTG3_1
>KI270888.1 HSCHR19KIR_GRC212_BA1_HAP_CTG3_1
>KV575258.1 HSCHR19KIR_HG2393_CTG3_1
>KV575251.1 HSCHR19KIR_HG2394_CTG3_1
>KV575260.1 HSCHR19KIR_HG2396_CTG3_1
>KI270889.1 HSCHR19KIR_LUCE_A_HAP_CTG3_1
>KI270890.1 HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1
>GL000209.2 HSCHR19KIR_RP5_B_HAP_CTG3_1
>KI270891.1 HSCHR19KIR_RSH_A_HAP_CTG3_1
>KI270914.1 HSCHR19KIR_RSH_BA2_HAP_CTG3_1
>KI270915.1 HSCHR19KIR_T7526_A_HAP_CTG3_1
>KI270916.1 HSCHR19KIR_T7526_BDEL_HAP_CTG3_1
>GL949746.1 HSCHR19LRC_COX1_CTG3_1
>GL949747.2 HSCHR19LRC_COX2_CTG3_1
>GL949748.2 HSCHR19LRC_LRC_I_CTG3_1
>GL949749.2 HSCHR19LRC_LRC_J_CTG3_1
>GL949750.2 HSCHR19LRC_LRC_S_CTG3_1
>GL949751.2 HSCHR19LRC_LRC_T_CTG3_1
>GL949752.1 HSCHR19LRC_PGF1_CTG3_1
>GL949753.2 HSCHR19LRC_PGF2_CTG3_1
>GL383573.1 HSCHR19_1_CTG2
>GL383574.1 HSCHR19_1_CTG3_1
>GL383575.2 HSCHR19_2_CTG2
>KI270866.1 HSCHR19_2_CTG3_1
>GL383576.1 HSCHR19_3_CTG2
>KI270867.1 HSCHR19_3_CTG3_1
>KI270865.1 HSCHR19_4_CTG2
>KI270938.1 HSCHR19_4_CTG3_1
>KI270868.1 HSCHR19_5_CTG2
>KI270760.1 HSCHR1_1_CTG11
>KI270762.1 HSCHR1_1_CTG3
>GL383518.1 HSCHR1_1_CTG31
>KI270759.1 HSCHR1_1_CTG32_1
>KI270766.1 HSCHR1_2_CTG3
>GL383519.1 HSCHR1_2_CTG31
>KI270761.1 HSCHR1_2_CTG32_1
>KQ458382.1 HSCHR1_3_CTG3
>GL383520.2 HSCHR1_3_CTG31
>KI270763.1 HSCHR1_3_CTG32_1
>KQ458383.1 HSCHR1_4_CTG3
>KI270765.1 HSCHR1_4_CTG31
>KI270764.1 HSCHR1_4_CTG32_1
>KQ983255.1 HSCHR1_5_CTG3
>KQ458384.1 HSCHR1_5_CTG32_1
>KV880763.1 HSCHR1_6_CTG3
>KZ208904.1 HSCHR1_8_CTG3
>KZ208905.1 HSCHR1_9_CTG3
>KI270892.1 HSCHR1_ALT2_1_CTG32_1
>GL383577.2 HSCHR20_1_CTG1
>KI270869.1 HSCHR20_1_CTG2
>KI270870.1 HSCHR20_1_CTG3
>KI270871.1 HSCHR20_1_CTG4
>GL383578.2 HSCHR21_1_CTG1_1
>GL383579.2 HSCHR21_2_CTG1_1
>GL383580.2 HSCHR21_3_CTG1_1
>GL383581.2 HSCHR21_4_CTG1_1
>KI270872.1 HSCHR21_5_CTG2
>KI270873.1 HSCHR21_6_CTG1_1
>KI270874.1 HSCHR21_8_CTG1_1
>GL383582.2 HSCHR22_1_CTG1
>GL383583.2 HSCHR22_1_CTG2
>KI270875.1 HSCHR22_1_CTG3
>KI270876.1 HSCHR22_1_CTG4
>KI270877.1 HSCHR22_1_CTG5
>KI270878.1 HSCHR22_1_CTG6
>KI270879.1 HSCHR22_1_CTG7
>KB663609.1 HSCHR22_2_CTG1
>KI270928.1 HSCHR22_3_CTG1
>KN196485.1 HSCHR22_4_CTG1
>KN196486.1 HSCHR22_5_CTG1
>KQ458387.1 HSCHR22_6_CTG1
>KQ458388.1 HSCHR22_7_CTG1
>KQ759761.1 HSCHR22_8_CTG1
>KI270769.1 HSCHR2_1_CTG1
>KI270767.1 HSCHR2_1_CTG15
>GL383521.1 HSCHR2_1_CTG5
>KI270772.1 HSCHR2_1_CTG7
>GL383522.1 HSCHR2_1_CTG7_2
>KI270770.1 HSCHR2_2_CTG1
>KI270893.1 HSCHR2_2_CTG15
>KI270894.1 HSCHR2_2_CTG7
>GL582966.2 HSCHR2_2_CTG7_2
>KI270773.1 HSCHR2_3_CTG1
>KI270776.1 HSCHR2_3_CTG15
>KI270768.1 HSCHR2_3_CTG7_2
>KI270774.1 HSCHR2_4_CTG1
>KI270771.1 HSCHR2_4_CTG7_2
>KI270775.1 HSCHR2_5_CTG7_2
>KQ983256.1 HSCHR2_6_CTG7_2
>KZ208907.1 HSCHR2_7_CTG7_2
>KZ208908.1 HSCHR2_8_CTG7_2
>JH636055.2 HSCHR3_1_CTG1
>GL383526.1 HSCHR3_1_CTG2_1
>KI270779.1 HSCHR3_1_CTG3
>KI270777.1 HSCHR3_2_CTG2_1
>KI270782.1 HSCHR3_2_CTG3
>KI270783.1 HSCHR3_3_CTG1
>KI270778.1 HSCHR3_3_CTG2_1
>KI270895.1 HSCHR3_3_CTG3
>KZ208909.1 HSCHR3_4_CTG1
>KI270780.1 HSCHR3_4_CTG2_1
>KI270924.1 HSCHR3_4_CTG3
>ML143343.1 HSCHR3_5_CTG1
>KI270781.1 HSCHR3_5_CTG2_1
>KI270934.1 HSCHR3_5_CTG3
>KZ559105.1 HSCHR3_6_CTG2_1
>KI270935.1 HSCHR3_6_CTG3
>KZ559101.1 HSCHR3_7_CTG2_1
>KI270936.1 HSCHR3_7_CTG3
>KZ559102.1 HSCHR3_8_CTG2_1
>KI270937.1 HSCHR3_8_CTG3
>KZ559103.1 HSCHR3_9_CTG2_1
>KI270784.1 HSCHR3_9_CTG3
>KQ983258.1 HSCHR4_11_CTG12
>KV766193.1 HSCHR4_12_CTG12
>GL383527.1 HSCHR4_1_CTG12
>KI270790.1 HSCHR4_1_CTG4
>GL383528.1 HSCHR4_1_CTG6
>KI270787.1 HSCHR4_1_CTG8_1
>GL000257.2 HSCHR4_1_CTG9
>KI270785.1 HSCHR4_2_CTG12
>KQ090013.1 HSCHR4_2_CTG4
>KI270786.1 HSCHR4_3_CTG12
>KI270788.1 HSCHR4_4_CTG12
>KI270789.1 HSCHR4_5_CTG12
>KI270896.1 HSCHR4_6_CTG12
>KI270925.1 HSCHR4_7_CTG12
>KQ090014.1 HSCHR4_8_CTG12
>KQ090015.1 HSCHR4_9_CTG12
>GL383532.1 HSCHR5_1_CTG1
>KI270897.1 HSCHR5_1_CTG1_1
>GL383531.1 HSCHR5_1_CTG5
>GL949742.1 HSCHR5_2_CTG1
>GL339449.2 HSCHR5_2_CTG1_1
>KI270795.1 HSCHR5_2_CTG5
>KI270791.1 HSCHR5_3_CTG1
>GL383530.1 HSCHR5_3_CTG1_1
>KI270898.1 HSCHR5_3_CTG5
>KI270792.1 HSCHR5_4_CTG1
>KI270796.1 HSCHR5_4_CTG1_1
>KI270793.1 HSCHR5_5_CTG1
>KI270794.1 HSCHR5_6_CTG1
>KN196477.1 HSCHR5_7_CTG1
>KV575243.1 HSCHR5_8_CTG1
>KZ208910.1 HSCHR5_9_CTG1
>KQ090017.1 HSCHR6_1_CTG10
>GL383533.1 HSCHR6_1_CTG2
>KB021644.2 HSCHR6_1_CTG3
>KI270797.1 HSCHR6_1_CTG4
>KI270798.1 HSCHR6_1_CTG5
>KI270799.1 HSCHR6_1_CTG6
>KI270800.1 HSCHR6_1_CTG7
>KI270801.1 HSCHR6_1_CTG8
>KI270802.1 HSCHR6_1_CTG9
>KI270758.1 HSCHR6_8_CTG1
>GL000250.2 HSCHR6_MHC_APD_CTG1
>GL000251.2 HSCHR6_MHC_COX_CTG1
>GL000252.2 HSCHR6_MHC_DBB_CTG1
>GL000253.2 HSCHR6_MHC_MANN_CTG1
>GL000254.2 HSCHR6_MHC_MCF_CTG1
>GL000255.2 HSCHR6_MHC_QBL_CTG1
>GL000256.2 HSCHR6_MHC_SSTO_CTG1
>KI270804.1 HSCHR7_1_CTG1
>KI270806.1 HSCHR7_1_CTG4_4
>GL383534.2 HSCHR7_1_CTG6
>KI270805.1 HSCHR7_1_CTG7
>KI270899.1 HSCHR7_2_CTG1
>KI270809.1 HSCHR7_2_CTG4_4
>KI270803.1 HSCHR7_2_CTG6
>KI270807.1 HSCHR7_2_CTG7
>KZ559106.1 HSCHR7_3_CTG1
>KZ208913.1 HSCHR7_3_CTG4_4
>KI270808.1 HSCHR7_3_CTG6
>KI270811.1 HSCHR8_1_CTG1
>KI270814.1 HSCHR8_1_CTG6
>KI270810.1 HSCHR8_1_CTG7
>KI270812.1 HSCHR8_2_CTG1
>KI270815.1 HSCHR8_2_CTG7
>KI270813.1 HSCHR8_3_CTG1
>KI270816.1 HSCHR8_3_CTG7
>KI270818.1 HSCHR8_4_CTG1
>KI270817.1 HSCHR8_4_CTG7
>KI270900.1 HSCHR8_5_CTG1
>KI270819.1 HSCHR8_5_CTG7
>KI270901.1 HSCHR8_6_CTG1
>KI270820.1 HSCHR8_6_CTG7
>KI270926.1 HSCHR8_7_CTG1
>KZ559107.1 HSCHR8_7_CTG7
>KI270821.1 HSCHR8_8_CTG1
>KI270822.1 HSCHR8_9_CTG1
>GL383539.1 HSCHR9_1_CTG1
>GL383540.1 HSCHR9_1_CTG2
>GL383541.1 HSCHR9_1_CTG3
>GL383542.1 HSCHR9_1_CTG4
>KI270823.1 HSCHR9_1_CTG5
>KQ090018.1 HSCHR9_1_CTG6
>KQ090019.1 HSCHR9_1_CTG7
>KI270880.1 HSCHRX_1_CTG3
>KI270881.1 HSCHRX_2_CTG12
>KI270913.1 HSCHRX_2_CTG3
>KV766199.1 HSCHRX_3_CTG7
>KI270302.1 KI270302.1
>KI270303.1 KI270303.1
>KI270304.1 KI270304.1
>KI270305.1 KI270305.1
>KI270310.1 KI270310.1
>KI270311.1 KI270311.1
>KI270312.1 KI270312.1
>KI270315.1 KI270315.1
>KI270316.1 KI270316.1
>KI270317.1 KI270317.1
>KI270320.1 KI270320.1
>KI270322.1 KI270322.1
>KI270329.1 KI270329.1
>KI270330.1 KI270330.1
>KI270333.1 KI270333.1
>KI270334.1 KI270334.1
>KI270335.1 KI270335.1
>KI270336.1 KI270336.1
>KI270337.1 KI270337.1
>KI270338.1 KI270338.1
>KI270340.1 KI270340.1
>KI270362.1 KI270362.1
>KI270363.1 KI270363.1
>KI270364.1 KI270364.1
>KI270366.1 KI270366.1
>KI270371.1 KI270371.1
>KI270372.1 KI270372.1
>KI270373.1 KI270373.1
>KI270374.1 KI270374.1
>KI270375.1 KI270375.1
>KI270376.1 KI270376.1
>KI270378.1 KI270378.1
>KI270379.1 KI270379.1
>KI270381.1 KI270381.1
>KI270382.1 KI270382.1
>KI270383.1 KI270383.1
>KI270384.1 KI270384.1
>KI270385.1 KI270385.1
>KI270386.1 KI270386.1
>KI270387.1 KI270387.1
>KI270388.1 KI270388.1
>KI270389.1 KI270389.1
>KI270390.1 KI270390.1
>KI270391.1 KI270391.1
>KI270392.1 KI270392.1
>KI270393.1 KI270393.1
>KI270394.1 KI270394.1
>KI270395.1 KI270395.1
>KI270396.1 KI270396.1
>KI270411.1 KI270411.1
>KI270412.1 KI270412.1
>KI270414.1 KI270414.1
>KI270417.1 KI270417.1
>KI270418.1 KI270418.1
>KI270419.1 KI270419.1
>KI270420.1 KI270420.1
>KI270422.1 KI270422.1
>KI270423.1 KI270423.1
>KI270424.1 KI270424.1
>KI270425.1 KI270425.1
>KI270429.1 KI270429.1
>KI270435.1 KI270435.1
>KI270438.1 KI270438.1
>KI270442.1 KI270442.1
>KI270448.1 KI270448.1
>KI270465.1 KI270465.1
>KI270466.1 KI270466.1
>KI270467.1 KI270467.1
>KI270468.1 KI270468.1
>KI270507.1 KI270507.1
>KI270508.1 KI270508.1
>KI270509.1 KI270509.1
>KI270510.1 KI270510.1
>KI270511.1 KI270511.1
>KI270512.1 KI270512.1
>KI270515.1 KI270515.1
>KI270516.1 KI270516.1
>KI270517.1 KI270517.1
>KI270518.1 KI270518.1
>KI270519.1 KI270519.1
>KI270521.1 KI270521.1
>KI270522.1 KI270522.1
>KI270528.1 KI270528.1
>KI270529.1 KI270529.1
>KI270530.1 KI270530.1
>KI270538.1 KI270538.1
>KI270539.1 KI270539.1
>KI270544.1 KI270544.1
>KI270548.1 KI270548.1
>KI270579.1 KI270579.1
>KI270580.1 KI270580.1
>KI270581.1 KI270581.1
>KI270582.1 KI270582.1
>KI270583.1 KI270583.1
>KI270584.1 KI270584.1
>KI270587.1 KI270587.1
>KI270588.1 KI270588.1
>KI270589.1 KI270589.1
>KI270590.1 KI270590.1
>KI270591.1 KI270591.1
>KI270593.1 KI270593.1
>KI270706.1 KI270706.1
>KI270707.1 KI270707.1
>KI270708.1 KI270708.1
>KI270709.1 KI270709.1
>KI270710.1 KI270710.1
>KI270711.1 KI270711.1
>KI270712.1 KI270712.1
>KI270713.1 KI270713.1
>KI270714.1 KI270714.1
>KI270715.1 KI270715.1
>KI270716.1 KI270716.1
>KI270717.1 KI270717.1
>KI270718.1 KI270718.1
>KI270719.1 KI270719.1
>KI270720.1 KI270720.1
>KI270721.1 KI270721.1
>KI270722.1 KI270722.1
>KI270723.1 KI270723.1
>KI270724.1 KI270724.1
>KI270725.1 KI270725.1
>KI270726.1 KI270726.1
>KI270727.1 KI270727.1
>KI270728.1 KI270728.1
>KI270729.1 KI270729.1
>KI270730.1 KI270730.1
>KI270731.1 KI270731.1
>KI270732.1 KI270732.1
>KI270733.1 KI270733.1
>KI270734.1 KI270734.1
>KI270735.1 KI270735.1
>KI270736.1 KI270736.1
>KI270737.1 KI270737.1
>KI270738.1 KI270738.1
>KI270739.1 KI270739.1
>KI270740.1 KI270740.1
>KI270741.1 KI270741.1
>KI270742.1 KI270742.1
>KI270743.1 KI270743.1
>KI270744.1 KI270744.1
>KI270745.1 KI270745.1
>KI270746.1 KI270746.1
>KI270747.1 KI270747.1
>KI270748.1 KI270748.1
>KI270749.1 KI270749.1
>KI270750.1 KI270750.1
>KI270751.1 KI270751.1
>KI270752.1 KI270752.1
>KI270753.1 KI270753.1
>KI270754.1 KI270754.1
>KI270755.1 KI270755.1
>KI270756.1 KI270756.1
>KI270757.1 KI270757.1

Here's a shorter list, just the ones with chr6 in the name:

$ cat GRCh38.p13.genome.names.txt  | grep -i chr6
>chr6 6
>KQ090017.1 HSCHR6_1_CTG10
>GL383533.1 HSCHR6_1_CTG2
>KB021644.2 HSCHR6_1_CTG3
>KI270797.1 HSCHR6_1_CTG4
>KI270798.1 HSCHR6_1_CTG5
>KI270799.1 HSCHR6_1_CTG6
>KI270800.1 HSCHR6_1_CTG7
>KI270801.1 HSCHR6_1_CTG8
>KI270802.1 HSCHR6_1_CTG9
>KI270758.1 HSCHR6_8_CTG1
>GL000250.2 HSCHR6_MHC_APD_CTG1
>GL000251.2 HSCHR6_MHC_COX_CTG1
>GL000252.2 HSCHR6_MHC_DBB_CTG1
>GL000253.2 HSCHR6_MHC_MANN_CTG1
>GL000254.2 HSCHR6_MHC_MCF_CTG1
>GL000255.2 HSCHR6_MHC_QBL_CTG1
>GL000256.2 HSCHR6_MHC_SSTO_CTG1

Here is the full content of the file dat/info/decoys_alts.p:

HSCHR6_MHC_APD
HSCHR6_MHC_COX
HSCHR6_MHC_DBB
HSCHR6_MHC_MANN
HSCHR6_MHC_MCF
HSCHR6_MHC_QBL
HSCHR6_MHC_SSTO
HLA-A*01:01:01:01
HLA-A*01:01:01:02N
HLA-A*01:01:38L
HLA-A*01:02
HLA-A*01:03
HLA-A*01:04N
HLA-A*01:09
HLA-A*01:11N
HLA-A*01:14
HLA-A*01:16N
HLA-A*01:20
HLA-A*02:01:01:01
HLA-A*02:01:01:02L
HLA-A*02:01:01:03
HLA-A*02:01:01:04
HLA-A*02:02:01
HLA-A*02:03:01
HLA-A*02:03:03
HLA-A*02:05:01
HLA-A*02:06:01
HLA-A*02:07:01
HLA-A*02:10
HLA-A*02:251
HLA-A*02:259
HLA-A*02:264
HLA-A*02:265
HLA-A*02:266
HLA-A*02:269
HLA-A*02:279
HLA-A*02:32N
HLA-A*02:376
HLA-A*02:43N
HLA-A*02:455
HLA-A*02:48
HLA-A*02:51
HLA-A*02:533
HLA-A*02:53N
HLA-A*02:57
HLA-A*02:60:01
HLA-A*02:65
HLA-A*02:68
HLA-A*02:77
HLA-A*02:81
HLA-A*02:89
HLA-A*02:95
HLA-A*03:01:01:01
HLA-A*03:01:01:02N
HLA-A*03:01:01:03
HLA-A*03:02:01
HLA-A*03:11N
HLA-A*03:21N
HLA-A*03:36N
HLA-A*11:01:01
HLA-A*11:01:18
HLA-A*11:02:01
HLA-A*11:05
HLA-A*11:110
HLA-A*11:25
HLA-A*11:50Q
HLA-A*11:60
HLA-A*11:69N
HLA-A*11:74
HLA-A*11:75
HLA-A*11:77
HLA-A*23:01:01
HLA-A*23:09
HLA-A*23:38N
HLA-A*24:02:01:01
HLA-A*24:02:01:02L
HLA-A*24:02:01:03
HLA-A*24:02:03Q
HLA-A*24:02:10
HLA-A*24:03:01
HLA-A*24:07:01
HLA-A*24:08
HLA-A*24:09N
HLA-A*24:10:01
HLA-A*24:11N
HLA-A*24:152
HLA-A*24:20
HLA-A*24:215
HLA-A*24:61
HLA-A*24:86N
HLA-A*25:01:01
HLA-A*26:01:01
HLA-A*26:11N
HLA-A*26:15
HLA-A*26:50
HLA-A*29:01:01:01
HLA-A*29:01:01:02N
HLA-A*29:02:01:01
HLA-A*29:02:01:02
HLA-A*29:46
HLA-A*30:01:01
HLA-A*30:02:01:01
HLA-A*30:02:01:02
HLA-A*30:04:01
HLA-A*30:89
HLA-A*31:01:02
HLA-A*31:01:23
HLA-A*31:04
HLA-A*31:14N
HLA-A*31:46
HLA-A*32:01:01
HLA-A*32:06
HLA-A*33:01:01
HLA-A*33:03:01
HLA-A*33:07
HLA-A*34:01:01
HLA-A*34:02:01
HLA-A*36:01
HLA-A*43:01
HLA-A*66:01:01
HLA-A*66:17
HLA-A*68:01:01:01
HLA-A*68:01:01:02
HLA-A*68:01:02:01
HLA-A*68:01:02:02
HLA-A*68:02:01:01
HLA-A*68:02:01:02
HLA-A*68:02:01:03
HLA-A*68:02:02
HLA-A*68:03:01
HLA-A*68:08:01
HLA-A*68:113
HLA-A*68:17
HLA-A*68:18N
HLA-A*68:22
HLA-A*68:71
HLA-A*69:01
HLA-A*74:01
HLA-A*74:02:01:01
HLA-A*74:02:01:02
HLA-A*80:01:01:01
HLA-A*80:01:01:02
HLA-B*07:02:01
HLA-B*07:05:01
HLA-B*07:06
HLA-B*07:156
HLA-B*07:33:01
HLA-B*07:41
HLA-B*07:44
HLA-B*07:50
HLA-B*08:01:01
HLA-B*08:08N
HLA-B*08:132
HLA-B*08:134
HLA-B*08:19N
HLA-B*08:20
HLA-B*08:33
HLA-B*08:79
HLA-B*13:01:01
HLA-B*13:02:01
HLA-B*13:02:03
HLA-B*13:02:09
HLA-B*13:08
HLA-B*13:15
HLA-B*13:25
HLA-B*14:01:01
HLA-B*14:02:01
HLA-B*14:07N
HLA-B*15:01:01:01
HLA-B*15:01:01:02N
HLA-B*15:01:01:03
HLA-B*15:02:01
HLA-B*15:03:01
HLA-B*15:04:01
HLA-B*15:07:01
HLA-B*15:108
HLA-B*15:10:01
HLA-B*15:11:01
HLA-B*15:13:01
HLA-B*15:16:01
HLA-B*15:17:01:01
HLA-B*15:17:01:02
HLA-B*15:18:01
HLA-B*15:220
HLA-B*15:25:01
HLA-B*15:27:01
HLA-B*15:32:01
HLA-B*15:42
HLA-B*15:58
HLA-B*15:66
HLA-B*15:77
HLA-B*15:83
HLA-B*18:01:01:01
HLA-B*18:01:01:02
HLA-B*18:02
HLA-B*18:03
HLA-B*18:17N
HLA-B*18:26
HLA-B*18:94N
HLA-B*27:04:01
HLA-B*27:05:02
HLA-B*27:05:18
HLA-B*27:06
HLA-B*27:07:01
HLA-B*27:131
HLA-B*27:24
HLA-B*27:25
HLA-B*27:32
HLA-B*35:01:01:01
HLA-B*35:01:01:02
HLA-B*35:01:22
HLA-B*35:02:01
HLA-B*35:03:01
HLA-B*35:05:01
HLA-B*35:08:01
HLA-B*35:14:02
HLA-B*35:241
HLA-B*35:41
HLA-B*37:01:01
HLA-B*37:01:05
HLA-B*38:01:01
HLA-B*38:02:01
HLA-B*38:14
HLA-B*39:01:01:01
HLA-B*39:01:01:02L
HLA-B*39:01:01:03
HLA-B*39:01:03
HLA-B*39:01:16
HLA-B*39:01:21
HLA-B*39:05:01
HLA-B*39:06:02
HLA-B*39:10:01
HLA-B*39:13:02
HLA-B*39:14
HLA-B*39:34
HLA-B*39:38Q
HLA-B*40:01:01
HLA-B*40:01:02
HLA-B*40:02:01
HLA-B*40:03
HLA-B*40:06:01:01
HLA-B*40:06:01:02
HLA-B*40:10:01
HLA-B*40:150
HLA-B*40:40
HLA-B*40:72:01
HLA-B*40:79
HLA-B*41:01:01
HLA-B*41:02:01
HLA-B*42:01:01
HLA-B*42:02
HLA-B*42:08
HLA-B*44:02:01:01
HLA-B*44:02:01:02S
HLA-B*44:02:01:03
HLA-B*44:02:17
HLA-B*44:02:27
HLA-B*44:03:01
HLA-B*44:03:02
HLA-B*44:04
HLA-B*44:09
HLA-B*44:138Q
HLA-B*44:150
HLA-B*44:23N
HLA-B*44:26
HLA-B*44:46
HLA-B*44:49
HLA-B*44:56N
HLA-B*45:01:01
HLA-B*45:04
HLA-B*46:01:01
HLA-B*46:01:05
HLA-B*47:01:01:01
HLA-B*47:01:01:02
HLA-B*48:01:01
HLA-B*48:03:01
HLA-B*48:04
HLA-B*48:08
HLA-B*49:01:01
HLA-B*49:32
HLA-B*50:01:01
HLA-B*51:01:01
HLA-B*51:01:02
HLA-B*51:02:01
HLA-B*51:07:01
HLA-B*51:42
HLA-B*52:01:01:01
HLA-B*52:01:01:02
HLA-B*52:01:01:03
HLA-B*52:01:02
HLA-B*53:01:01
HLA-B*53:11
HLA-B*54:01:01
HLA-B*54:18
HLA-B*55:01:01
HLA-B*55:01:03
HLA-B*55:02:01
HLA-B*55:12
HLA-B*55:24
HLA-B*55:48
HLA-B*56:01:01
HLA-B*56:03
HLA-B*56:04
HLA-B*57:01:01
HLA-B*57:03:01
HLA-B*57:06
HLA-B*57:11
HLA-B*57:29
HLA-B*58:01:01
HLA-B*58:31N
HLA-B*59:01:01:01
HLA-B*59:01:01:02
HLA-B*67:01:01
HLA-B*67:01:02
HLA-B*67:02
HLA-B*73:01
HLA-B*78:01:01
HLA-B*81:01
HLA-B*82:02:01
HLA-C*01:02:01
HLA-C*01:02:11
HLA-C*01:02:29
HLA-C*01:02:30
HLA-C*01:03
HLA-C*01:06
HLA-C*01:08
HLA-C*01:14
HLA-C*01:21
HLA-C*01:30
HLA-C*01:40
HLA-C*02:02:02:01
HLA-C*02:02:02:02
HLA-C*02:10
HLA-C*02:11
HLA-C*02:16:02
HLA-C*02:69
HLA-C*02:85
HLA-C*02:86
HLA-C*02:87
HLA-C*03:02:01
HLA-C*03:02:02:01
HLA-C*03:02:02:02
HLA-C*03:02:02:03
HLA-C*03:03:01
HLA-C*03:04:01:01
HLA-C*03:04:01:02
HLA-C*03:04:02
HLA-C*03:04:04
HLA-C*03:05
HLA-C*03:06
HLA-C*03:100
HLA-C*03:13:01
HLA-C*03:20N
HLA-C*03:219
HLA-C*03:261
HLA-C*03:40:01
HLA-C*03:41:02
HLA-C*03:46
HLA-C*03:61
HLA-C*04:01:01:01
HLA-C*04:01:01:02
HLA-C*04:01:01:03
HLA-C*04:01:01:04
HLA-C*04:01:01:05
HLA-C*04:01:62
HLA-C*04:03:01
HLA-C*04:06
HLA-C*04:09N
HLA-C*04:128
HLA-C*04:161
HLA-C*04:177
HLA-C*04:70
HLA-C*04:71
HLA-C*05:01:01:01
HLA-C*05:01:01:02
HLA-C*05:08
HLA-C*05:09:01
HLA-C*05:93
HLA-C*06:02:01:01
HLA-C*06:02:01:02
HLA-C*06:02:01:03
HLA-C*06:23
HLA-C*06:24
HLA-C*06:46N
HLA-C*07:01:01:01
HLA-C*07:01:01:02
HLA-C*07:01:02
HLA-C*07:01:19
HLA-C*07:01:27
HLA-C*07:01:45
HLA-C*07:02:01:01
HLA-C*07:02:01:02
HLA-C*07:02:01:03
HLA-C*07:02:01:04
HLA-C*07:02:01:05
HLA-C*07:02:05
HLA-C*07:02:06
HLA-C*07:02:64
HLA-C*07:04:01
HLA-C*07:04:02
HLA-C*07:06
HLA-C*07:149
HLA-C*07:18
HLA-C*07:19
HLA-C*07:26
HLA-C*07:30
HLA-C*07:32N
HLA-C*07:384
HLA-C*07:385
HLA-C*07:386
HLA-C*07:391
HLA-C*07:392
HLA-C*07:49
HLA-C*07:56:02
HLA-C*07:66
HLA-C*07:67
HLA-C*08:01:01
HLA-C*08:01:03
HLA-C*08:02:01:01
HLA-C*08:02:01:02
HLA-C*08:03:01
HLA-C*08:04:01
HLA-C*08:112
HLA-C*08:20
HLA-C*08:21
HLA-C*08:22
HLA-C*08:24
HLA-C*08:27
HLA-C*08:36N
HLA-C*08:40
HLA-C*08:41
HLA-C*08:62
HLA-C*12:02:02
HLA-C*12:03:01:01
HLA-C*12:03:01:02
HLA-C*12:08
HLA-C*12:13
HLA-C*12:19
HLA-C*12:22
HLA-C*12:99
HLA-C*14:02:01
HLA-C*14:03
HLA-C*14:21N
HLA-C*14:23
HLA-C*15:02:01
HLA-C*15:05:01
HLA-C*15:05:02
HLA-C*15:13
HLA-C*15:16
HLA-C*15:17
HLA-C*15:96Q
HLA-C*16:01:01
HLA-C*16:02:01
HLA-C*16:04:01
HLA-C*17:01:01:01
HLA-C*17:01:01:02
HLA-C*17:01:01:03
HLA-C*17:03
HLA-C*18:01
HLA-DQA1*01:01:02
HLA-DQA1*01:02:01:01
HLA-DQA1*01:02:01:02
HLA-DQA1*01:02:01:03
HLA-DQA1*01:02:01:04
HLA-DQA1*01:03:01:01
HLA-DQA1*01:03:01:02
HLA-DQA1*01:04:01:01
HLA-DQA1*01:04:01:02
HLA-DQA1*01:05:01
HLA-DQA1*01:07
HLA-DQA1*01:10
HLA-DQA1*01:11
HLA-DQA1*02:01
HLA-DQA1*03:01:01
HLA-DQA1*03:02
HLA-DQA1*03:03:01
HLA-DQA1*04:01:02:01
HLA-DQA1*04:01:02:02
HLA-DQA1*04:02
HLA-DQA1*05:01:01:01
HLA-DQA1*05:01:01:02
HLA-DQA1*05:03
HLA-DQA1*05:05:01:01
HLA-DQA1*05:05:01:02
HLA-DQA1*05:05:01:03
HLA-DQA1*05:11
HLA-DQA1*06:01:01
HLA-DQB1*02:01:01
HLA-DQB1*02:02:01
HLA-DQB1*03:01:01:01
HLA-DQB1*03:01:01:02
HLA-DQB1*03:01:01:03
HLA-DQB1*03:02:01
HLA-DQB1*03:03:02:01
HLA-DQB1*03:03:02:02
HLA-DQB1*03:03:02:03
HLA-DQB1*03:05:01
HLA-DQB1*05:01:01:01
HLA-DQB1*05:01:01:02
HLA-DQB1*05:03:01:01
HLA-DQB1*05:03:01:02
HLA-DQB1*06:01:01
HLA-DQB1*06:02:01
HLA-DQB1*06:03:01
HLA-DQB1*06:09:01
HLA-DRB1*01:01:01
HLA-DRB1*01:02:01
HLA-DRB1*03:01:01:01
HLA-DRB1*03:01:01:02
HLA-DRB1*04:03:01
HLA-DRB1*07:01:01:01
HLA-DRB1*07:01:01:02
HLA-DRB1*08:03:02
HLA-DRB1*09:21
HLA-DRB1*10:01:01
HLA-DRB1*11:01:01
HLA-DRB1*11:01:02
HLA-DRB1*11:04:01
HLA-DRB1*12:01:01
HLA-DRB1*12:17
HLA-DRB1*13:01:01
HLA-DRB1*13:02:01
HLA-DRB1*14:05:01
HLA-DRB1*14:54:01
HLA-DRB1*15:01:01:01
HLA-DRB1*15:01:01:02
HLA-DRB1*15:01:01:03
HLA-DRB1*15:01:01:04
HLA-DRB1*15:02:01
HLA-DRB1*15:03:01:01
HLA-DRB1*15:03:01:02
HLA-DRB1*16:02:01

It seems like many of the GENCODE names are not in the decoys_alts.p file, and vice versa.

slowkow commented 3 years ago

An aside...

By the way, you might want to consider using .txt or .json files instead of pickle files. This should help users and developers to more quickly discover how things are setup just by using a text editor, instead of opening python and running something = pickle.load(open('file.p', 'rb')).

This also has another benefit: if you decide someday to change the decoys_alts.p file, git will not show what lines changed, because it is a binary file. If it were a .txt file instead, we could see exactly what names were added or removed at any commit.

slowkow commented 3 years ago

I'm calling genotypes for these genes:

"A"    "B"    "C"    "DPB1" "DQA1" "DQB1" "DRB1"

I have 336 total genotype calls (24 samples 2 alleles 7 genes).

I have two runs (GENCODE with alt, UCSC without alt).

Every sample has discrepancies.

For complete matches between the two runs: 83 matches and 146 mismatches.

For 2-digit alleles: 152 matches and 133 mismatches.

I was careful to account for the random order of paternal and maternal alleles by sorting the alleles before checking for a match.

I did not use the unmapped option that you mentioned, so I wonder if that might help (arcasHLA extract --unmapped).

IoanFilip2 commented 3 years ago

Thanks for following up on this issue. Regarding the ALT sequences, you raise a good point, namely that many of GENCODE names are not currently in the decoys_alts.p file. We can update the decoys_alts.p file to contain the chr6-specific ALT sequences from GENCODE -- that is an omission on our part. Indeed, it should improve the concordance between your runs to remove the chr6-specific ALT sequences (the ALT sequences from other chromosomes, I suspect, will have a smaller impact). In light of this, my recommendation would be to run mapping first without ALT sequences, at the very least without the chr6 ALT sequences specifically (and to include the --unmapped flag in the extract step). Does that significantly increase the concordance between your runs?

Another consideration here: is the mismatch rate similar for class I and class II genes? What is the typical coverage in your data for class I genes and class II genes (which are not constitutively expressed in every cell)? Do you have low RIN (RNA integrity number) samples? Are you using pair-end or single-end? The tissue of origin, as well as these technical features of your sample runs, might explain why the HLA calling can yield differing results when you change your reference with/without ALT sequences that were not previously included in our decoys_alts.p file. Especially for HLA class II genes, which may only have little coverage overall, making the genotyping for those loci very sensitive to changes in the input reads.

chrisdphd commented 3 years ago

(Hi Kamil, nice to meet you. I've been using your very useful snakemake tutorial and a couple more clicks led me here...) If I may chime in (you probably already know this...), I think that your very interesting problem with the chr6 HLA/MHC loci is caused by the genome biology and evolution of these sequences. There is a lot of diversity in the MHC loci in humans, which is evolutionarily advantageous. (Homogeneous banana populations are essentially clonal and are susceptible to being wiped out by a single virus, for example).

If you include the ALT assemblies, then for a given sample the reads may align to the ALT version that most closely matches that particular sample. (Maybe to two ALT seqs, since we are diploid?). You might want to pull the genotypes for that sample from those alignments, rather than the consensus chr6 location. (P.S., I bet the GENCODE and UCSC main chr seqs are identical). Some reads may still align to chr6, and you may even get genotype calls. The reads that align to ALT may be homozygous reference (with respect to ALT), but that might be homozygous non-reference if they'd aligned to chr6.

If you align vs chr6 alone, then all the alternate reads will be forced to align on the full chr6 sequence, yet they will differ from it, and generate a lot of SNP calls.

So it seems to me that the task is more difficult that simple variant calling ...... you need to determine which ALT, or haplotype block(s) that you have present in each sample, and then whether or not the sample has any SNPs on top of that.... or am I being naiive? Sorry if all this is obvious, and you'd already considered it!

slowkow commented 3 years ago

Thanks for the comments, it seems like getting confident genotype calls might be a bit more challenging than I expected. My next step will be to visualize the read pileups for each gene to assess if there is enough data to support any calls.

Some day, I might try the https://github.com/lkuchenb/MultiHLA pipeline...