PaddlePaddle / PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)【安全加固,暂停交互,请耐心等待】
https://www.paddlepaddle.org.cn/hub
Apache License 2.0
12.72k stars 2.08k forks source link

使用 bert service 报错 #1030

Open T-baby opened 3 years ago

T-baby commented 3 years ago

环境如下: opencv-python<=4.2.0.32 paddlepaddle-gpu==1.8.5.post107 paddlehub==1.8.1 paddle-gpu-serving>=0.8.2 ujson>=1.35

对着 https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.8/docs/tutorial/bert_service.md 教程启动服务端时出现以下错误:

/bin/sh: 1: ./bin/serving-gpu: not found

顺便还想问下,bert 类的模型支持 TensorRT 吗?

T-baby commented 3 years ago

经测试是容器中缺少了 wget

T-baby commented 3 years ago

但是又出现了下面的问题

Traceback (most recent call last): File "/Users/andy/Projects/bert_service/test_client.py", line 18, in result = bc.get_result(input_text=input_text) File "/Users/andy/.local/lib/python3.8/site-packages/paddlehub/serving/bert_serving/bs_client.py", line 21, in get_result return self.bs.encode(input_text) File "/Users/andy/.local/lib/python3.8/site-packages/paddlehub/serving/bert_serving/bert_service.py", line 231, in encode response_msg = self.request_server(request_msg) File "/Users/andy/.local/lib/python3.8/site-packages/paddlehub/serving/bert_serving/bert_service.py", line 126, in request_server self.serving_list[self.con_index], err)) IndexError: list index out of range

T-baby commented 3 years ago

似乎是因为 Infer Error with server ip:9000 : [Errno 61] Connection refused 引起的,但是我设置的是 ip:8080 端口,服务端也是,为什么会自动去调用 9000 端口?

T-baby commented 3 years ago

似乎指定了端口并不生效,我在服务端设置了 ENTRYPOINT ["hub","serving","start","bert_service","-m","chinese-electra-base","-p","8080","--use_gpu"]

日志中仍然是

I1119 11:19:14.255861 46 server.cpp:1037] Server[baidu::paddle_serving::predictor::bert_service::BertServiceImpl] is serving on port=9000. I1119 11:19:14.255930 46 server.cpp:1040] Check out http://34963ec3bf8a:9000 in web browser

T-baby commented 3 years ago

9000 端口和我设置的 端口 都开放才能正常访问。。很迷惑。。

T-baby commented 3 years ago

而且输出结果为啥是 [[-0.38675302267074585]] 这样的数组呢。

T-baby commented 3 years ago

用的模型是 chinese-electra-base

T-baby commented 3 years ago

当数据量一大会出现下面的问题:

terminate called after throwing an instance of 'paddle::memory::allocation::BadAlloc' what(): Cannot malloc 12000.000244 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mb environment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.003141. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0 at [/root/code-version/Serving/build/third_party/Paddle/src/extern_paddle/paddle/fluid/memory/detail/system_allocator.cc:134] PaddlePaddle Call Stacks:

然后就卡死了。。

T-baby commented 3 years ago

当数据量一大会出现下面的问题:

terminate called after throwing an instance of 'paddle::memory::allocation::BadAlloc' what(): Cannot malloc 12000.000244 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use or FLAGS_initial_gpu_memory_in_mb or FLAGS_reallocate_gpu_memory_in_mb environment variable to a lower value. Current FLAGS_fraction_of_gpu_memory_to_use value is 0.003141. Current FLAGS_initial_gpu_memory_in_mb value is 0. Current FLAGS_reallocate_gpu_memory_in_mb value is 0 at [/root/code-version/Serving/build/third_party/Paddle/src/extern_paddle/paddle/fluid/memory/detail/system_allocator.cc:134] PaddlePaddle Call Stacks:

然后就卡死了。。

这个问题我已经查出来了。是因为会根据输入数量自动调整 batch_size,所以可能会引发显存不够。

T-baby commented 3 years ago

还有就是,似乎每次创建 bs_client.BSClient 就会导致显存增加 5G 左右,而且不下降。是每增加一个客户端就会多开一个进程?

T-baby commented 3 years ago

还是因为我的调用时的文本数量导致它调整了 batch size 显存上升?那怎么让它下降呢。。

T-baby commented 3 years ago

似乎指定了端口并不生效,我在服务端设置了 ENTRYPOINT ["hub","serving","start","bert_service","-m","chinese-electra-base","-p","8080","--use_gpu"]

日志中仍然是

I1119 11:19:14.255861 46 server.cpp:1037] Server[baidu::paddle_serving::predictor::bert_service::BertServiceImpl] is serving on port=9000. I1119 11:19:14.255930 46 server.cpp:1040] Check out http://34963ec3bf8a:9000 in web browser

https://github.com/PaddlePaddle/PaddleHub/issues/1023 应该是我这个同样的问题。

T-baby commented 3 years ago

9000 端口和我设置的 端口 都开放才能正常访问。。很迷惑。。

这个问题我已经找到代码中的问题了。

在 paddle_gpu_serving 中,init.py 第 137 行开始:


    def run(self, gpu_index=0, port=8866):

        self.port = port
        os.chdir(self.get_path())
        self.modify_conf(gpu_index)
        serving_port = self.find_serving_port()
        if serving_port < 0:
            print('No port available.')
            return -1
        self.serving_port = serving_port

        if self.with_gpu_flag == True:
            gpu_msg = '--gpuid=' + str(gpu_index) + ' '
            run_cmd = self.gpu_run_cmd + gpu_msg
            run_cmd += '--port=' + str(
                serving_port) + ' ' + '--resource_file=resource.prototxt.' + str(
                    gpu_index) + ' '
            print('Start serving on gpu ' + str(gpu_index) + ' port = ' + str(
                serving_port))
        else:
            re = subprocess.Popen(
                'cat /usr/local/cuda/version.txt > tmp 2>&1', shell=True)
            re.wait()
            if re.returncode == 0:
                run_cmd = self.gpu_run_cmd + '--port=' + str(serving_port) + ' '
            else:
                run_cmd = self.cpu_run_cmd + '--port=' + str(serving_port) + ' '
            print('Start serving on cpu port = {}'.format(serving_port))

        process = subprocess.Popen(run_cmd, shell=True)

        self.p_list.append(process)
        if not self.run_m:
            self.hold()

传递进去的 port 参数并没有用到,使用的都是 serving_port,而 serving_port 是由 find_serving_port 决定的。

def find_serving_port(self):
        for i in range(1000):
            port = 9000 + i
            with closing(socket.socket(socket.AF_INET,
                                       socket.SOCK_STREAM)) as sock:
                sock.settimeout(2)
                result = sock.connect_ex(('0.0.0.0', port))
            if result != 0:
                return port
        return -1

所以默认就是 9000 端口。但又通过用户设置的端口开了一个 socket 连接。看起来似乎是个检查连接正常的?所以必须要两个端口一起开?建议完善下文档,或者合成一个端口。最好是两个端口都允许人设置。

或者能把 paddle_gpu_serving 的项目开给我?我直接帮你们改了。。

T-baby commented 3 years ago

主要是以下加个问题:

  1. 端口设置不合理,这样很容易让人遇到问题。

  2. 不支持 chinese-electra-base,这个我自己改好了。

  3. 会自动根据输入的条数修改 batch_size,然后显存就下不来了,如果超过显存会直接崩溃,建议修改为在修改之前检查显存够不够,显存不够不改 batch_size,而是直接报错,不要直接退出。

  4. 显存上升后就再也不下降了。

  5. 并发线程数理论上也应该允许用户修改。

  6. 现在只要连接的线程数超过了默认设置的 4 个就直接退出。

  7. 客户端报错建议不要直接报 IndexError: list index out of range,连不上服务端的话还是报连不上会比较好吧?而且可以报某个地址和端口连不上,这样也方便大家排查错误。

ZeyuChen commented 3 years ago

@T-baby 感谢您的尝试和反馈信息,提供了高质量的issues内容!