camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
3.03k stars 474 forks source link

Negative value as accuracy of table. #44

Open satheeshkatipomu opened 5 years ago

satheeshkatipomu commented 5 years ago

While testing I have faced a case where table.accuracy is negative number.

PDF:page-3.pdf Code:

tables=camelot.read_pdf('/Users/skatipomu/Table_Extraction_Camelot/page3.pdf',pages="all)
[table.accuracy for table in tables]

Output: [99.99999999999997, -20.852716930856104]

I think the reason is because in compute_accuracy method in utils.py while calculating accuracy we are subtracting error percentage from 1. It is supposed to be in the range [0.0,1.0] but the errors passed on to this method contains error percentages in the range[0 to 100] which inturn is from get_table_index method. So dividing this error by 100 solved the issue for me.

def compute_accuracy(error_weights):
    """Calculates a score based on weights assigned to various
    parameters and their error percentages.

    Parameters
    ----------
    error_weights : list
        Two-dimensional list of the form [[p1, e1], [p2, e2], ...]
        where pn is the weight assigned to list of errors en.
        Sum of pn should be equal to 100.

    Returns
    -------
    score : float

    """
    SCORE_VAL = 100
    try:
        score = 0
        if sum([ew[0] for ew in error_weights]) != SCORE_VAL:
            raise ValueError("Sum of weights should be equal to 100.")
        for ew in error_weights:
            weight = ew[0] / len(ew[1])
            for error_percentage in ew[1]:
                **score += weight * (1 - error_percentage)**
    except ZeroDivisionError:
        score = 0
    return score

from score += weight * (1 - error_percentage) to score += weight * (1 - error_percentage/100.0)

anakin87 commented 5 years ago

https://github.com/atlanhq/camelot/issues/223

satheeshkatipomu commented 5 years ago

closing as it is already raised in atlanhq/camelot repo

vinayak-mehta commented 5 years ago

Opening this as a reference instead.