WansonChoi / HATK

A collection of modules to process and analyze IMGT-HLA sequences.
28 stars 9 forks source link

python error #11

Closed freshfischer closed 3 years ago

freshfischer commented 3 years ago

@WansonChoi Hello, when I use the imgt2seq module to generate a new version of the IMGT-HLA database, an error happened that I can't figure out. Is there any solution? Looking forward to your kindly reply! code and error as follow:

python3 HATK.py \
>  --imgt2seq \
>  --hg 19 \
>  --imgt 3400 \
>  --out MyIMGT2Seq/ExamplePrefix.hg19.imgt3400 \
>  --imgt-dir example/IMGTHLA3400 \
>  --multiprocess 8
Namespace(Ggroup=False, HLA=None, NoCaption=False, Pgroup=False, aa=None, ar=None, bmarkergenerator=False, chped=None, condition=None, condition_list=None, covar=None, covar_name=None, dict_AA=None, dict_SNPS=None, fam=None, fourF=False, hat=None, heatmap=False, hg='19', hla2hped=False, hped=None, imgt='3400', imgt2seq=True, imgt_dir='example/IMGTHLA3400', input=None, leave_NotFound=False, logistic=False, manhattan=False, maptable=None, metaanalysis=False, multiprocess=8, no_indel=False, nomencleaner=False, omnibus=False, oneF=False, out='MyIMGT2Seq/ExamplePrefix.hg19.imgt3400', phased=None, pheno=None, pheno_name=None, platform=None, point_color='#778899', point_size='15', reference_allele=None, rhped=None, s1_bim=None, s1_logistic_result=None, s2_bim=None, s2_logistic_result=None, save_intermediates=False, threeF=False, top_color='#FF0000', twoF=False, variants=None, yaxis_unit='10')

[IMGT2Seq.py]: Multiprocessing.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA A.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA B.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA C.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DPB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQA1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DQB1.

[ProcessIMGT.py]: Generating sequence information dictionary for HLA DRB1.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/dengcm/soft/HATK-master/IMGT2Seq/src/ProcessIMGT.py", line 255, in ProcessIMGT
    _has_Indel=(not _no_Indel))
  File "/home/dengcm/soft/HATK-master/IMGT2Seq/src/ProcessIMGT.py", line 877, in getPositionInfo_SNPS
    curr_POS = l_Rest[-1] + (+1 if not _isReverse else -1)
TypeError: can only concatenate str (not "int") to str
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "HATK.py", line 243, in <module>
    myStudy = HLA_Study(args)
  File "/home/dengcm/soft/HATK-master/src/HLA_Study.py", line 293, in __init__
    _imgt_dir=_args.imgt_dir)
  File "/home/dengcm/soft/HATK-master/IMGT2Seq/IMGT2Seq.py", line 196, in __init__
    _p_data="IMGT2Seq/data", __Nfield_OUTPUT_FORMAT=Nfield_OUTPUT_FORMAT)
  File "/home/dengcm/soft/HATK-master/IMGT2Seq/IMGT2Seq.py", line 395, in IMGT2Seq
    t_df_Seqs_SNPS, t_df_Seqs_AA, t_df_forMAP_SNPS, t_df_forMAP_AA, t_MAPTABLE = dict_Pool[HLA_names[i]].get()
  File "/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 683, in get
    raise self._value
TypeError: can only concatenate str (not "int") to str
freshfischer commented 3 years ago

this problem solved when I add compartment --no-indel. some differences between A_nuc.txt between IMGTHLA3320 and IMGTHLA3420

# file: A_nuc.txt
# date: 2018-04-16
# version: IPD-IMGT/HLA 3.32.0
# origin: http://hla.alleles.org/wmda/A_nuc.txt
# repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/A_nuc.txt
# author: WHO, Steven G. E. Marsh (steven.marsh.ac.uk)

 cDNA              1
 AA codon          -24
                   |
 A*01:01:01:01     ATG GCC GTC ATG GCG CCC CGA ACC CTC CTC CTG CTA CTC TCG GGG GCC CTG GCC CTG ACC CAG ACC TGG GCG G|GC 
 A*01:01:01:02N    --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
 A*01:01:01:03     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
 A*01:01:01:04     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
 A*01:01:01:05     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
 A*01:01:01:06     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
 A*01:01:01:07     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- -|-- 
# file: A_nuc.txt
# date: 2020-10-15
# version: IPD-IMGT/HLA 3.42.0
# origin: http://hla.alleles.org/wmda/A_nuc.txt
# repository: https://raw.githubusercontent.com/ANHIG/IMGTHLA/Latest/alignments/A_nuc.txt
# author: Steven G. E. Marsh (steven.marsh@ucl.ac.uk)

 cDNA              1
 AA codon          -24
                   |
 A*01:01:01:01     ATG GCC GTC ATG GCG CCC CGA ACC CTC CTC CTG CTA CTC TCG GGG GCC CTG GCC CT......G ACC CAG ACC TGG GCG G.|GC 
 A*01:01:01:02N    --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:03     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:04     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:05     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:06     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:07     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:08     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:09     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:10     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:11     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
 A*01:01:01:12     --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --......- --- --- --- --- --- -.|-- 
WansonChoi commented 3 years ago

@freshfischer

Hi, Thank you for your interest in HATK.

It's good that you solved the problem with the '--no-indel' argument. It seems the raw contents(ex. 'A_nuc.txt') of the IMGT database of the version around 3.42.0 became a little bit different from that of the version around 3.32.0. I'll look into and try to update IMGT2Seq so that it can cover the latest version of the imgt database.

I do appreciate this important report.

zoequandt commented 8 months ago

@freshfischer and @WansonChoi

I am getting the same error- where do I put the '--no-indel' argument?