conjuncts / gmft

Lightweight, performant, deep table extraction
MIT License
274 stars 18 forks source link

table detection miss #10

Open cspx2 opened 2 months ago

cspx2 commented 2 months ago

Why does table detection fail to recognize the second table in page 2? test.pdf

Is there a way to adjust the detection "threshold"?

conjuncts commented 2 months ago

Thanks for the report. Currently, this is how to adjust the threshold:

# detector = AutoTableDetector()

config = TableDetectorConfig()
config.detector_base_threshold = 0.5

tables = detector.extract(page, config_overrides=config)

but even still, gmft misses that table. I think it's a limitation with the pretrained Table Transformer model, so unfortunately I can't do much. You can try manually specifying a bbox (#9) or maybe obtaining a bbox through camelot, tabula, img2table etc.