fl4p / fetlib

Find the right switch. Scrapes data-sheets and ranks FETs by power loss in a DCDC converter
4 stars 0 forks source link

Table detection #27

Open fl4p opened 1 month ago

fl4p commented 1 month ago

table2matrix

Datasheets contain merged cells if a unit or condition applies to multiple rows. headers might also be merged. when iterating the data row wise, we need to first resolve the merged cells and copy the value across all rows within the span.

[['A', 'A', 'A', 'B', 'C', 'D'], ['A', 'A', 'A', 'E', 'E', 'E'], ['A', 'A', 'A', 'E', 'C', 'C'], ['E', 'C', 'C', 'C', 'C', 'C']]


# Tabula
* the web version has an auto-detect function, which performs much better than tabula-java (CLI)
* it does not auto-detect nested tables
* selecting the whole page leads to poor results

<details open>
  <summary>EPC2306</summary>

![EPC2306](https://github.com/user-attachments/assets/29800f31-cbb2-49ea-bd08-43d5cf7d40f0)

</details>

![image](https://github.com/user-attachments/assets/a0298fc5-33cf-4b75-a389-8832337a4c7a)

## Stream
(FIELD* headers are actually empty, the CSV->MD converter put them)

<details>
  <summary>Table</summary>

|FIELD1   |FIELD2                                               |Dynamic Characteristics# (TJ = 25°C unless otherwise stated)|FIELD4|FIELD5|FIELD6|FIELD7|
|---------|-----------------------------------------------------|------------------------------------------------------------|------|------|------|------|
|         |PARAMETER                                            |TEST CONDITIONS                                             |MIN   |TYP   |MAX   |UNIT  |
|CISS     |Input Capacitance                                    |                                                            |      |1777  |2369  |      |
|CRSS     |Reverse Transfer Capacitance                         |VDS = 50 V, VGS = 0 V                                       |      |5.8   |      |      |
|COSS     |Output Capacitance                                   |                                                            |      |616   |803   |pF    |
|COSS(ER) |Effective Output Capacitance, Energy Related (Note 1)|                                                            |      |730   |      |      |
|VDS = 0 to 50 V, VGS = 0 VCOSS(TR)|Effective Output Capacitance, Time Related (Note 2)  |                                                            |      |882   |      |      |
|RG       |Gate Resistance                                      |                                                            |      |0.4   |      |Ω     |
|QG       |Total Gate Charge                                    |VDS = 50 V, VGS = 5 V, ID = 25 A                            |      |12.3  |16.2  |      |
|QGS      |Gate to Source Charge                                |                                                            |      |4.3   |      |      |
|QGD      |Gate-to-Drain Charge                                 |VDS = 50 V, ID = 25 A                                       |      |1.1   |      |      |
|         |                                                     |                                                            |      |      |      |nC    |
|QG(TH)   |Gate Charge at Threshold                             |                                                            |      |3.1   |      |      |
|QOSS     |Output Charge                                        |VDS = 50 V, VGS = 0 V                                       |      |44    |57    |      |
|QRR      |Source-Drain Recovery Charge                         |                                                            |      |0     |      |      |

</details>

## Latice
* headers (min) are off

<details>
  <summary>Table</summary>

|Dynamic Characteristics# (TJ|25°C unless otherwise stated)                        |FIELD3                          |FIELD4|FIELD5|FIELD6|FIELD7|
|----------------------------|-----------------------------------------------------|--------------------------------|------|------|------|------|
|PARAMETER                   |TEST CONDITIONS                                      |MIN                             |TYP   |MAX   |UNIT  |      |
|CISS                        |Input Capacitance                                    |VDS = 50 V, VGS = 0 V           |      |1777  |2369  |pF    |
|CRSS                        |Reverse Transfer Capacitance                         |                                |5.8   |      |      |      |
|COSS                        |Output Capacitance                                   |                                |616   |803   |      |      |
|COSS(ER)                    |Effective Output Capacitance, Energy Related (Note 1)|VDS = 0 to 50 V, VGS = 0 V      |      |730   |      |      |
|COSS(TR)                    |Effective Output Capacitance, Time Related (Note 2)  |                                |882   |      |      |      |
|RG                          |Gate Resistance                                      |                                |      |0.4   |      |Ω     |
|QG                          |Total Gate Charge                                    |VDS = 50 V, VGS = 5 V, ID = 25 A|      |12.3  |16.2  |nC    |
|QGS                         |Gate to Source Charge                                |VDS = 50 V, ID = 25 A           |      |4.3   |      |      |
|QGD                         |Gate-to-Drain Charge                                 |                                |1.1   |      |      |      |
|QG(TH)                      |Gate Charge at Threshold                             |                                |3.1   |      |      |      |
|QOSS                        |Output Charge                                        |VDS = 50 V, VGS = 0 V           |      |44    |57    |      |
|QRR                         |Source-Drain Recovery Charge                         |                                |      |0     |      |      |

</details>

## Findings
* stream extraction method appears to be more usable here
* it doesn't provide an output that supports merged cells?
* the JSON output contains raw cell coordinates. cells are already grouped in rows. a cell with a rowspan (merged vertically) occurs in the first row. all subsequent rows within the span (and the same column) will have empty content. we can easily fill these rows with the same value
* auto-detect does not find nested tables and output quality suffers when selecting the whole page
* when we know the table bbox, the output can be good

#  BSB028N06NN3GXUMA2.pdf

![image](https://github.com/user-attachments/assets/d6e665fa-d736-4cbb-91ec-976124f8062a)

## tabula stream
| Parameter                    | Symbol   | Conditions            |      | Values |       | Unit |
|------------------------------|----------|-----------------------|------|--------|-------|------|
|                              |          |                       | min. | typ.   | max.  |      |
| Dynamic characteristics      |          |                       |      |        |       |      |
| Input capacitance            | C iss    |                       | -    | 8800   | 12000 | pF   |
|                              |          | V GS=0 V, V DS=30 V,  |      |        |       |      |
| Output capacitance           | C oss    |                       | -    | 2100   | 2800  |      |
|                              |          | f =1 MHz              |      |        |       |      |
| Reverse transfer capacitance | Crss     |                       | -    | 64     | -     |      |
| Turn-on delay time           | t d(on)  |                       | -    | 21     | -     | ns   |
| Rise time                    | t r      | V DD=30 V, V GS=10 V, | -    | 9      | -     |      |
|                              |          | I =30 A, R            |      |        |       |      |
| Turn-off delay time          | t d(off) | D G,ext=1.6 W         | -    | 38     | -     |      |
| Fall time                    | t f      |                       | -    | 6      | -     |      |

* results are good. it even uses the column headers from the "previous" table (min/typ/max)

## pix2image
![image](https://github.com/user-attachments/assets/98837843-7830-47e2-b2f3-0fb94306c080)

* for the first 3 rows it detects a single rowspan=3 , across all columns. this could be simplified. still it its not a usable result.
fl4p commented 1 month ago

pix2text

p2t predict -l en --resized-shape 2048 --file-type pdf -i datasheets/epc/EPC2306.pdf -o epc2306.md \
    --save-debug-res output-debug-p2t 

9-TABLE

MD table (no merged cells) | | PARAMETER | TEST CONDITIONS | MIN | TYP | MAX | UNIT | | --- | --- | --- | --- | --- | --- | --- | | CIss | Input Capacitance | Vos=50V,Vcs=0V | | 1777 | 2369 | pF | | Cass | Reverse Transfer Capacitance | | 5.8 | | | Coss | Output Capacitance | | | 616 | 803 | | CoSSER | Effective Output Capacitance, Energy Related (Note 1) | Vos=0to 50V,VGs=0V | | 730 | | | CossTR | Effective Output Capacitance, Time Related (Note 2) | | 882 | | | Re | Gate Resistance | | | 0.4 | | Q | | QG | Total Gate Charge | Vps=50V,Vcs=5V,lb=25A | | 12.3 | 16.2 | nC | | QGs | Gate to Source Charge | Vps=50V,lp=25 A | | 4.3 | | | QG | Gate-to-Drain Charge | | 1.1 | | | QGirn) | Gate Charge at Threshold | | 3.1 | | | Qoss | Output Charge | Vps=50V,Vcs=0V | | 44 | 57 | | QRR | Source-Drain Recovery Charge | | | 0 | |

HTML

p2t = Pix2Text.from_config()
doc = p2t.recognize_pdf('../datasheets/EPC/EPC2306.pdf', page_numbers=[1], resized_shape=2048)
table = doc.pages[0].elements[9]
print(table.meta['html'][0])
PARAMETERTEST CONDITIONSMINTYPMAXUNIT
CIssInput CapacitanceVos=50V,Vcs=0V17772369pF
CassReverse Transfer Capacitance5.8
CossOutput Capacitance616803
CoSSEREffective Output Capacitance, Energy Related (Note 1)Vos=0to 50V,VGs=0V730
CossTREffective Output Capacitance, Time Related (Note 2)882
ReGate Resistance0.4Q
QGTotal Gate ChargeVps=50V,Vcs=5V,lb=25A12.316.2nC
QGsGate to Source ChargeVps=50V,lp=25 A4.3
QGGate-to-Drain Charge1.1
QGirn)Gate Charge at Threshold3.1
QossOutput ChargeVps=50V,Vcs=0V4457
QRRSource-Drain Recovery Charge0

Another example

1-TABLE

HTML and MD tables
ParametersymbolValuesUnteNote I Test Condition
Min.Typ.Max.
Drain-source breakdown voltageV(BR)DSS100--VVes=0V, Io=1 mA
Gate threshold voltageVesth2.23.03.8VVos=Ves, /D=72 uA
Zero gate voltage drain currentls:0.1 101 100uAVos=100V, Ves=0 V, T=25°0 Vps=100 V, Ves=0 V, Tj=125°0
Gate-source leakage currentless-10100nAVes=20 V,Vos=0\V
Drain-source on-state resistanceRosom:4.3 5.35.0 7.1m2Ves=10 V,D=50A Ves=6V,D=25 A
Gate resistance)Re-1.21.8Q-
TransconductanceOfs50100-S|Vos|>2|/p|Ros(on)max,|b=50 A
| Parameter | symbol | Values | Unte | Note I Test Condition | Min. | Typ. | Max. | | --- | --- | --- | --- | --- | --- | --- | --- | | Drain-source breakdown voltage | V(BR)DSS | 100 | - | - | V | Ves=0V, Io=1 mA | | Gate threshold voltage | Vesth | 2.2 | 3.0 | 3.8 | V | Vos=Ves, /D=72 uA | | Zero gate voltage drain current | ls | : | 0.1 10 | 1 100 | uA | Vos=100V, Ves=0 V, T=25°0 Vps=100 V, Ves=0 V, Tj=125°0 | | Gate-source leakage current | less | - | 10 | 100 | nA | Ves=20 V,Vos=0\V | | Drain-source on-state resistance | Rosom | : | 4.3 5.3 | 5.0 7.1 | m2 | Ves=10 V,D=50A Ves=6V,D=25 A | | Gate resistance) | Re | - | 1.2 | 1.8 | Q | - | | Transconductance | Ofs | 50 | 100 | - | S | |Vos|>2|/p|Ros(on)max,|b=50 A |
fl4p commented 1 month ago

img2table