I'm using lattice for a PDF in which I have a table with lines which doesn't cross each other, like this (horizontal mask) :
A good approach to solve this would be to dilate, then erode the mask.
After dilate, lines will cross each other, and then erode will restore the table to its original dimensions.
There is already a documented iterations parameter, which I think might have been added for such issue :
iterations (int, optional (default: 0)) –
Number of times for erosion/dilation is applied.
For more information, refer OpenCV’s dilate.
If I use iterations=1, I end up with the following mask (horizontal and vertical merged, bottom-right of the table) :
It 'works', as now lines do cross each other, but it only dilates the image.
As a result, the detected grid contains an additional line at the top and bottom of the table.
I would suggest the following change in image_processing.py to solve this :
dmask = cv2.dilate(threshold, el, iterations=iterations)
+dmask = cv2.erode(dmask, el, iterations=iterations)
However, this could potentially break some existing software, and I'm not sure why only dilate was added in the first place.
Maybe adding a new parameter erode_iterations would be better.
What do you think ? I can make a PR for this change if requested.
Hi,
I'm using
lattice
for a PDF in which I have a table with lines which doesn't cross each other, like this (horizontal mask) : A good approach to solve this would be to dilate, then erode the mask. Afterdilate
, lines will cross each other, and thenerode
will restore the table to its original dimensions.There is already a documented
iterations
parameter, which I think might have been added for such issue :If I use
iterations=1
, I end up with the following mask (horizontal and vertical merged, bottom-right of the table) :It 'works', as now lines do cross each other, but it only dilates the image. As a result, the detected grid contains an additional line at the top and bottom of the table.
I would suggest the following change in
image_processing.py
to solve this :However, this could potentially break some existing software, and I'm not sure why only dilate was added in the first place. Maybe adding a new parameter
erode_iterations
would be better. What do you think ? I can make a PR for this change if requested.