WansonChoi / HATK

A collection of modules to process and analyze IMGT-HLA sequences.
26 stars 9 forks source link

imgt2seq error at DRB1 in IMGTHLA3480 #18

Open anjatietz opened 2 years ago

anjatietz commented 2 years ago

Hi there, I am trying to integrate a newer version of the IMGT Database for my analysis but there seems to be an issue at the DRB1 locus. I downloaded the database from: ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/ and stored it in your example directory. I get maptable files for HLA-A to HLA-DQB1, but not for DRB1. Any help is much appreciated.

I used the following command: python3 HATK.py --imgt2seq --hg 38 --imgt 3480 --2field --imgt-dir example/IMGTHLA3480 --out MyIMGT2Seq/ExamplePrefix.hg38.imgt3480

And this is what I get: Namespace(Ggroup=False, HLA=None, NoCaption=False, Pgroup=False, aa=None, ar=None, bmarkergenerator=False, chped=None, condition=None, condition_list=None, covar=None, covar_name=None, dict_AA=None, dict_SNPS=None, fam=None, fourF=False, hat=None, heatmap=False, hg='38', hla2hped=False, hped=None, imgt='3480', imgt2seq=True, imgt_dir='/home/user/HATK/example/IMGTHLA3480', input=None, leave_NotFound=False, logistic=False, manhattan=False, maptable=None, metaanalysis=False, multiprocess=1, no_indel=False, nomencleaner=False, omnibus=False, oneF=False, out='MyIMGT2Seq/ExamplePrefix.hg38.imgt3480', phased=None, pheno=None, pheno_name=None, platform=None, point_color='#778899', point_size='15', reference_allele=None, rhped=None, s1_bim=None, s1_logistic_result=None, s2_bim=None, s2_logistic_result=None, save_intermediates=False, threeF=False, top_color='#FF0000', twoF=True, variants=None, yaxis_unit='10')

[ProcessIMGT.py]: Generating sequence information dictionary for HLA A.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA B.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA C.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DRB1. Traceback (most recent call last): File "HATK.py", line 243, in myStudy = HLA_Study(args) File "/home/user/HATK/src/HLA_Study.py", line 293, in init _imgt_dir=_args.imgt_dir) File "/home/user/HATK/IMGT2Seq/IMGT2Seq.py", line 196, in init _p_data="IMGT2Seq/data", __Nfield_OUTPUT_FORMAT=Nfield_OUTPUT_FORMAT) File "/home/user/HATK/IMGT2Seq/IMGT2Seq.py", line 401, in IMGT2Seq _p_data, _no_Indel=_no_Indel, _save_intermediates=_save_intermediates) File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 130, in ProcessIMGT df_Seqs_splited_noIndel_gen = df_raw_Seqs_splitted_gen.apply(lambda x : ProcessIndel(x, _remove_indel=True), axis=0) File "/home/user/anaconda3/envs/HATK/lib/python3.7/site-packages/pandas/core/frame.py", line 6928, in apply return op.get_result() File "/home/user/anaconda3/envs/HATK/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result return self.apply_standard() File "/home/user/anaconda3/envs/HATK/lib/python3.7/site-packages/pandas/core/apply.py", line 292, in apply_standard self.apply_series_generator() File "/home/user/anaconda3/envs/HATK/lib/python3.7/site-packages/pandas/core/apply.py", line 321, in apply_series_generator results[i] = self.f(v) File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 130, in df_Seqs_splited_noIndel_gen = df_raw_Seqs_splitted_gen.apply(lambda x : ProcessIndel(x, _remove_indel=True), axis=0) File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 787, in ProcessIndel return _sr.apply(lambda x: getTrimmedSeqs(x, l_spanInfo, _remove_indel)) File "/home/user/anaconda3/envs/HATK/lib/python3.7/site-packages/pandas/core/series.py", line 4045, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 787, in return _sr.apply(lambda x: getTrimmedSeqs(x, l_spanInfo, _remove_indel)) File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 753, in getTrimmedSeqs IndelSeqs = pd.Series([_string[idx[0]:idx[1]] for idx in _l_target_idx[0]]) File "/home/user/HATK/IMGT2Seq/src/ProcessIMGT.py", line 753, in IndelSeqs = pd.Series([_string[idx[0]:idx[1]] for idx in _l_target_idx[0]]) TypeError: ("'NoneType' object is not subscriptable", 'occurred at index 2')

xingejun commented 2 years ago

Hi @anjatietz ,

Do you have any solutions?

I have met the same issue as you.

Any suggestions will help.

Thank you very much! Xinxin

WansonChoi commented 2 years ago

Hi @anjatietz, Thank you for using HATK and reporting this error.

I found the same error is replicated by me.

About a month ago, I checked the IMGT2Seq works with v3.47.0. I guess the 'ProcessIMGT.py' script can't cover some raw sequences from the latest 3.48.0 version.

I'm going to update the IMGT2Seq but this will take some time. Maybe you should use a former version of the IMGT database(<3.48.0) for now.