Closed amdhacks closed 5 years ago
You can see in snapshot below that first character from row no 2-10 is cropped and append as last character in output table. snapshot:
The input files is : document-page3.pdf
Any idea,why this is happening and how can it be fixed.
Thanks.
i am also having the same issue related to first character position.
Hi @amdhacks and @abhibisht89! Thanks for the report, this is a known issue (more details here #170 and here #213). You can expect a fix by the end of this week. Till then, you can try the workaround mentioned in #170, by finding the path where camelot is installed and passing detect_vertical=False
in base.py.
My base.py file after the change given below:
# -*- coding: utf-8 -*-
import os
from ..utils import get_page_layout, get_text_objects
class BaseParser(object):
"""Defines a base parser.
"""
def _generate_layout(self, filename):
self.filename = filename
self.layout, self.dimensions = get_page_layout(
self.filename,
char_margin=self.char_margin,
line_margin=self.line_margin,
word_margin=self.word_margin)
self.horizontal_text = get_text_objects(self.layout, ltype="lh")
self.vertical_text = get_text_objects(self.layout, ltype="lv")
self.pdf_width, self.pdf_height = self.dimensions
self.rootname, __ = os.path.splitext(self.filename)
self.detect_vertical = False
But I do not see any improvement. First character is still shown as last character of the same row in output.
Am I missing something?
Sorry, I should've been more specific. You need to add detect_vertical=True
to get_page_layout
. You can check out its definition in utils.py
.
@amdhacks This is fixed now. It will be more configurable after #170.
@vinayak-mehta, I have updated camelot version to 0.4.1 but first character which was showing as last earlier is now appearing as a single character in first column like below:
Output now:
C lass A
N et Asset Value at 31 December 5,111,372
N umber of outstanding units at 31 December 49,136
N et Asset Value per unit at 31 December 104.03
C lass B
N et Asset Value at 31 December 49,144,825
N umber of outstanding units at 31 December 471,555
N et Asset Value per unit at 31 December 104.22
Please suggest. Thanks.
@amdhacks Please install the latest version i.e. v0.5.0. The table isn't being detected correctly for this case. You'll need to specify a table area.
$ camelot --format csv --output output.csv stream -T 70,690,550,170 input.pdf
@vinayak-mehta I'm having the exact same issue as @amdhacks, and I have version 0.7.3 The Output has the same issue as: C lass A N et Asset Value at 31 December 5,111,372 N umber of outstanding units at 31 December 49,136 N et Asset Value per unit at 31 December 104.03 C lass B N et Asset Value at 31 December 49,144,825 N umber of outstanding units at 31 December 471,555 N et Asset Value per unit at 31 December 104.22
Also, the table boundaries in the Input PDF as as clearly defined as they can be. Any solution for this?
Will share input file shortly. Unable to share from restricted network at this moment.