[Bug] 当前图像处理过程中，未考虑 Grayscale with alpha

imClumsyPanda commented 1 year ago

请提供下述完整信息以便快速定位问题 (Please provide the following information to quickly locate the problem)

系统环境/System Environment：macOS
使用的是哪门语言的程序/Which programing language：python
所使用语言相关版本信息/Version：3.8
OnnxRuntime版本/OnnxRuntime Version：1.15.1
可复现问题的demo/Demo of reproducible problems：

处理文献中存在的 Grayscale with alpha 类型图像时，将图像读取为ndarray时，ndarray的shape为[1007, 915, 2]，经检查属于Grayscale with alpha 类型图像，目前的处理程序中 utils 中 LoadImage 类的 call 函数暂未考虑该类型图像的处理，仅考虑了2维ndarray、3维ndarray且第3维size为4的情况。

完整报错/Complete Error Message：

Traceback (most recent call last):
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/pydevd.py", line 1496, in _exec
pydev_imports.execfile(file, globals, locals)  # execute the script
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/liuqian/Downloads/chatchat-dev/document_loader/mypdfloader.py", line 35, in <module>
docs = loader.load()
File "/opt/homebrew/Caskroom/miniforge/base/envs/chatchat/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
elements = self._get_elements()
File "/Users/liuqian/Downloads/chatchat-dev/document_loader/mypdfloader.py", line 28, in _get_elements
text = pdf2text(self.file_path)
File "/Users/liuqian/Downloads/chatchat-dev/document_loader/mypdfloader.py", line 22, in pdf2text
result, _ = ocr(img_array)
File "/opt/homebrew/Caskroom/miniforge/base/envs/chatchat/lib/python3.8/site-packages/rapidocr_onnxruntime/rapid_ocr_api.py", line 74, in __call__
dt_boxes, det_elapse = self.text_detector(img)
File "/opt/homebrew/Caskroom/miniforge/base/envs/chatchat/lib/python3.8/site-packages/rapidocr_onnxruntime/ch_ppocr_v3_det/text_detect.py", line 65, in __call__
data = transform(data, self.preprocess_op)
File "/opt/homebrew/Caskroom/miniforge/base/envs/chatchat/lib/python3.8/site-packages/rapidocr_onnxruntime/ch_ppocr_v3_det/utils.py", line 225, in transform
data = op(data)
File "/opt/homebrew/Caskroom/miniforge/base/envs/chatchat/lib/python3.8/site-packages/rapidocr_onnxruntime/ch_ppocr_v3_det/utils.py", line 79, in __call__
data["image"] = (img * self.scale - self.mean) / self.std
ValueError: operands could not be broadcast together with shapes (992,928,2) (1,1,3)

可能的解决方案/Possible solutions: 补充 img.ndim == 3 and img.shape[2] == 2 时的处理方式，可参考：

img_array = np.frombuffer(samples, dtype=np.uint8).reshape(height, width, 2)
img_gray = img_array[:, :, 0]
img_gray = cv2.cvtColor(img_gray, cv2.COLOR_GRAY2BGR)
img_alpha = img_array[:, :, 1]

可参考如上代码进行 img_array 中 gray 和 alpha 两层的读取。

SWHL commented 1 year ago

已经在rapidocr_onnxruntime==1.3.1中修复，请再次尝试。

imClumsyPanda commented 1 year ago

实际测试已解决该问题，非常感谢🙏

RapidAI / RapidOCR

[Bug] 当前图像处理过程中，未考虑 Grayscale with alpha #115