ML-TANGO / TANGO

public repo for TANGO (Target Aware No-code neural network Generation and Operation framework)
Other
68 stars 20 forks source link

code_gen runtime error #141

Closed HyunwooCho closed 3 months ago

HyunwooCho commented 6 months ago

I got this error message when I ran TANGO service. I think this error comes from old python version. The 'annotations' import starts from Python 3.7. I think Dockerfile for code_gen should be modified like, FROM python:3.6 --> FROM python 3.7

$ cd TANGO
$ docker-compose up
...
code_gen_1         | Traceback (most recent call last):
code_gen_1         |   File "code_gen.py", line 25, in <module>
code_gen_1         |     import tvm
code_gen_1         |   File "/app/tvm/python/tvm/__init__.py", line 42, in <module>
code_gen_1         |     from .ir import IRModule
code_gen_1         |   File "/app/tvm/python/tvm/ir/__init__.py", line 47, in <module>
code_gen_1         |     from .module import IRModule
code_gen_1         |   File "/app/tvm/python/tvm/ir/module.py", line 18
code_gen_1         |     from __future__ import annotations
code_gen_1         |     ^
code_gen_1         | SyntaxError: future feature annotations is not defined
iksooman commented 6 months ago

After applying FROM python 3.7, I met the below error.

=> ERROR [code_gen 57/83] RUN add-apt-repository ppa:openjdk-r/ppa                                                                                                                     0.4s
------
 > [code_gen 57/83] RUN add-apt-repository ppa:openjdk-r/ppa:
0.296 Traceback (most recent call last):
0.296   File "/usr/bin/add-apt-repository", line 362, in <module>
0.296     sys.exit(0 if addaptrepo.main() else 1)
0.296                   ^^^^^^^^^^^^^^^^^
0.296   File "/usr/bin/add-apt-repository", line 345, in main
0.296     shortcut = handler(source, **shortcut_params)
0.296                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.296   File "/usr/lib/python3/dist-packages/softwareproperties/shortcuts.py", line 40, in shortcut_handler
0.296     return handler(shortcut, **kwargs)
0.296            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
0.296   File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 86, in __init__
0.296     if self.lpppa.publish_debug_symbols:
0.296        ^^^^^^^^^^
0.296   File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 126, in lpppa
0.296     self._lpppa = self.lpteam.getPPAByName(name=self.ppaname)
0.296                   ^^^^^^^^^^^
0.296   File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 113, in lpteam
0.296     self._lpteam = self.lp.people(self.teamname)
0.296                    ^^^^^^^^^^^^^^
0.296 AttributeError: 'NoneType' object has no attribute 'people'
------
failed to solve: process "/bin/sh -c add-apt-repository ppa:openjdk-r/ppa" did not complete successfully: exit code: 1

I could resolve this by adding RUN apt-get install python3-launchpadlib -y at line 124 of TANGO/deploy_codegen/optimize_codegen/Dockerfile

HyunwooCho commented 4 months ago

It turns out requiring Python 3.8 because 'Literal' is using in Python 3.8 and later.

code_gen_1         | Traceback (most recent call last):
code_gen_1         |   File "code_gen.py", line 25, in <module>
code_gen_1         |     import tvm
code_gen_1         |   File "/app/tvm/python/tvm/__init__.py", line 81, in <module>
code_gen_1         |     from . import relay
code_gen_1         |   File "/app/tvm/python/tvm/relay/__init__.py", line 29, in <module>
code_gen_1         |     from . import prelude
code_gen_1         |   File "/app/tvm/python/tvm/relay/prelude.py", line 21, in <module>
code_gen_1         |     from tvm.relay.transform import ToANormalFormExpr
code_gen_1         |   File "/app/tvm/python/tvm/relay/transform/__init__.py", line 22, in <module>
code_gen_1         |     from . import fake_quantization_to_integer, mixed_precision
code_gen_1         |   File "/app/tvm/python/tvm/relay/transform/fake_quantization_to_integer.py", line 25, in <module>
code_gen_1         |     from tvm.relay.qnn.op import canonicalizations
code_gen_1         |   File "/app/tvm/python/tvm/relay/qnn/__init__.py", line 20, in <module>
code_gen_1         |     from . import op
code_gen_1         |   File "/app/tvm/python/tvm/relay/qnn/op/__init__.py", line 21, in <module>
code_gen_1         |     from .qnn import *
code_gen_1         |   File "/app/tvm/python/tvm/relay/qnn/op/qnn.py", line 26, in <module>
code_gen_1         |     from tvm.relay.op.nn.utils import get_pad_tuple2d
code_gen_1         |   File "/app/tvm/python/tvm/relay/op/__init__.py", line 35, in <module>
code_gen_1         |     from . import strategy
code_gen_1         |   File "/app/tvm/python/tvm/relay/op/strategy/__init__.py", line 24, in <module>
code_gen_1         |     from . import arm_cpu
code_gen_1         |   File "/app/tvm/python/tvm/relay/op/strategy/arm_cpu.py", line 26, in <module>
code_gen_1         |     from tvm.dlight.gpu.matmul import auto_inline_consumers
code_gen_1         |   File "/app/tvm/python/tvm/dlight/__init__.py", line 18, in <module>
code_gen_1         |     from . import gpu
code_gen_1         |   File "/app/tvm/python/tvm/dlight/gpu/__init__.py", line 22, in <module>
code_gen_1         |     from .low_batch_gemv import LowBatchGEMV
code_gen_1         |   File "/app/tvm/python/tvm/dlight/gpu/low_batch_gemv.py", line 19, in <module>
code_gen_1         |     from typing import List, Literal, Optional, Set, Union
code_gen_1         | ImportError: cannot import name 'Literal' from 'typing' (/usr/local/lib/python3.7/typing.py)
HyunwooCho commented 4 months ago

There are 3 different ways to solve the 'typing.Literal' thing.

1) We are just using python 3.8 in Dockfile and no changes at all in source codes.

at Dockefile

From python 3.8 

at source codes,

from typing import Literal

2) We are using python 3.7, instead we have to use 'typing_extensions' not 'typing'.

at Dockefile

From python 3.8 

at source codes,

from typing_extensions import Literal

3) we are using python 3.7, and use try-except approach.

at Dockefile

From python 3.8 

at source codes,

try:
  from typing import Literal
except:
  from typing_extensions import Literal
HyunwooCho commented 4 months ago

계속 문제가 발생합니다.

위의 문제는 소스 코드를 고치지 않고 도커 파일에서 python 3.8을 사용하도록 하면 해결되지만

이는 tensorflow 버전 2.4.0으로 고정한 것과 충돌이 됩니다.

정확히는 tensorflow 2.4.0 내부에서 numpy.object를 사용하고 있는데,

python 3.8의 numpy 버전은 np.object를 폐기하고 그냥 object로 사용합니다.

따라서 여기서 다시 error가 발생합니다.

임시 방법은 numpy 버전을 옛날 버전 (1.23.5)으로 되돌리거나

tensorflow의 버전을 좀 더 최신 버전으로 설치해야 하는데..

이러면 또 어떤 error가 발생할 지 모르겠습니다.

따라서 이에 대한 전반적인 검토를 요청 드립니다.

code gen의 단독 시험이 아니라

docker-compose build
docker-compose up

으로 TANGO를 띄웠을 때 code_gen에서 build error와 runtime error가 발생하지 않는 조건을 찾아주세요

HyunwooCho commented 3 months ago

아래와 같은 tensorflow 버전에서 GCC7.3.1 컴파일러를 사용하는 가장 최신 버전인 2.8.0로 바꾸었습니다. image

즉 아래와 같이 바꾸었더니 빌드 에러와 런타임 에러가 발생하지 않습니다. 동작 시험은 아직 해보지 않았습니다.

1   FROM python:3.8
...
32  RUN pip3 install tensorflow==2.8.0
...
123 RUN apt-get install software-properties-common -y
124 # added by khlee to fix build error
125 RUN apt-get install python3-launchpadlib -y
126 RUN add-apt-repository ppa:openjdk-r/ppa
127 RUN apt-get install -y openjdk-17-jdk