fl4p / fetlib

Find the right switch. Scrapes data-sheets and ranks FETs by power loss in a DCDC converter
4 stars 0 forks source link

Re-evaluate LLM approach on value correctness #15

Open fl4p opened 1 month ago

fl4p commented 1 month ago

Results from the new benchmark comparing actual min/typ/max field values:

num *EQUAL* *VALUES*:
                                           total     
                           tabular_parse:  278 (100%)   37   35    9   31   29    1   39    7    6   25   30   29
    ocr_text2_claude-3-5-sonnet-20240620:  236 ( 85%)   30   31    6   29   21    0   34    7    6   20   26   26
                       ocr_text2_llama-3:  180 ( 65%)   24   21    8   21   22    0   26    5    5   15   16   17
                       text2_gpt-4o-mini:  157 ( 56%)   20   25    2   18   16    0   23    1    1    8   21   22
                   ocr_text2_gpt-4o-mini:  149 ( 54%)   23   21    3   22   14    0   15    5    3   10   16   17

tabular_parse is the reference, because we can assume that most of the values are correct here (no LLM, it has been carefully hand crafted).

A wrongly extracted value is much worse than a missing value, because we will not notice the mistake in the results of the power calc (missing values will output nan power values).

Analysis shows that the LLM takes values from neighbouring fields or just completely random.

The converterapi pdf2txt (or pdf2ocr2txt ?) seems to extract table contents columns wise (not row wise) , this might explain the neighbour confusion.

Random values might come from LLM exhaustion and Non deterministic effects?

piotrdelikat commented 1 month ago

Could you provide an example of a datasheet name/results where this is taking place?