When I use the old tabula-java, it will split the cells out of the table but it is not working in tabula-sharp, I just get a whole row/line without individual data broken out. Maybe this is because the table is non-uniform? (different column counts on different rows)
Example table (cannot attach PDF as it has personal info)
I am using the latest version of PDFPig but that didn't seem to work. See example code below, maybe i'm doing something wrong with the syntax, just trying to iterate through the row
using (PdfDocument document = PdfDocument.Open(path, new ParsingOptions() { ClipPaths = false }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(Page);
// detect canditate table zones
SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();
var regions = detector.Detect(page);
IExtractionAlgorithm ea = new BasicExtractionAlgorithm();
List<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area
var table = tables[0];
var rows = table.Rows;
string result = "";
string test = rows[0][0].GetText(); // <---- testing first cell
Run.PrintLog("Test: " + test);
foreach (var r in rows)
{
foreach (RectangularTextContainer txt in r)
{
result += txt.GetText() + "|"; //<---- for each cell (?)
}
result += System.Environment.NewLine;
}
Run.PrintLog("Tab result: " + result);
}
When I use the old tabula-java, it will split the cells out of the table but it is not working in tabula-sharp, I just get a whole row/line without individual data broken out. Maybe this is because the table is non-uniform? (different column counts on different rows)
Example table (cannot attach PDF as it has personal info)
I am using the latest version of PDFPig but that didn't seem to work. See example code below, maybe i'm doing something wrong with the syntax, just trying to iterate through the row