Open RK9534 opened 1 month ago
I am attempting to implement the provided example on my PDF file but have encountered an error. I have installed all the dependencies specified in the setup.py file. Below is the code I am using:
path="/path/to/dir/sample/2312.13560.pdf"
analyzer =dd.get_dd_analyzer(config_overwrite= ["PT.LAYOUT.WEIGHTS=microsoft/table-transformer-detection/pytorch_model.bin", "PT.ITEM.WEIGHTS=microsoft/table-transformer-structure-recognition/pytorch_model.bin", "PT.ITEM.FILTER=['table']", "OCR.USE_DOCTR=True", "OCR.USE_TESSERACT=False", "TEXT_ORDERING.INCLUDE_RESIDUAL_TEXT_CONTAINER=True", ])
analyzer.pipe_component_list[0].predictor.config.threshold = 0.4 df = analyzer.analyze(path=path) df.reset_state()
RuntimeError Traceback (most recent call last) in <cell line: 1>() ----> 1 dp = next(iter(df)) 2 np_image = dp.viz()
23 frames /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices) 794 if stride is None: 795 stride = torch.jit.annotate(List[int], []) --> 796 return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 797 798
RuntimeError: Given input size: (128x1x16). Calculated output size: (128x0x8). Output size is too small
I'm using Google Colab as an environment.
It looks that there is a problem with the image input size. Does the image have three channels and is it reasonably large (e.g. at least 600 px )?
Duplicate to #345
I am attempting to implement the provided example on my PDF file but have encountered an error. I have installed all the dependencies specified in the setup.py file. Below is the code I am using:
path="/path/to/dir/sample/2312.13560.pdf"
analyzer =dd.get_dd_analyzer(config_overwrite= ["PT.LAYOUT.WEIGHTS=microsoft/table-transformer-detection/pytorch_model.bin", "PT.ITEM.WEIGHTS=microsoft/table-transformer-structure-recognition/pytorch_model.bin", "PT.ITEM.FILTER=['table']", "OCR.USE_DOCTR=True", "OCR.USE_TESSERACT=False", "TEXT_ORDERING.INCLUDE_RESIDUAL_TEXT_CONTAINER=True", ])
analyzer.pipe_component_list[0].predictor.config.threshold = 0.4
df = analyzer.analyze(path=path) df.reset_state()
Till here its fine. dp = next(iter(df)) after this error: [0517 09:11.58 @doctectionpipe.py:84] INF Processing 2312.13560.pdf [0517 09:12.02 @context.py:126] INF ImageLayoutService total: 1.9849 sec. [0517 09:12.03 @context.py:126] INF SubImageLayoutService total: 1.5151 sec. [0517 09:12.03 @context.py:126] INF PubtablesSegmentationService total: 0.0409 sec. [0517 09:12.09 @context.py:126] INF ImageLayoutService total: 5.6937 sec.
RuntimeError Traceback (most recent call last) in <cell line: 1>()
----> 1 dp = next(iter(df))
2 np_image = dp.viz()
23 frames /usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices) 794 if stride is None: 795 stride = torch.jit.annotate(List[int], []) --> 796 return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) 797 798
RuntimeError: Given input size: (128x1x16). Calculated output size: (128x0x8). Output size is too small
I'm using Google Colab as an environment.