emcniece / ha_pdf

Home Assistant PDF File Sensor
8 stars 6 forks source link

Home Assistant PDF File Sensor

The pdf sensor enables the parsing and searching of text content from local PDF files using PyPDF2.

Installation

Clone this repo into the custom_components directory:

cd [HA_HOME]/custom_components
git clone https://github.com/emcniece/ha_pdf.git

Configuration

Home Assistant must first be allowed to access the directory with the target PDF file. Add the allowlist_external_dirs to configuration.yaml. In this example, the PDF exists at /config/my.pdf:

homeassistant:
  allowlist_external_dirs:
    - /config

Next add the sensor definition:

sensor:
  - platform: pdf
    name: My PDF Sensor
    file_path: /config/my.pdf

By default this configuration will extract all text content found in the first page of the PDF.

Configuration Variables

file_path (required)

Path to a local PDF file


unit_of_measurement (optional)

Measurement unit to associate with the rendered value


pdf_page (optional)

Numeric value of the PDF Page to search. Default: 0


regex_search (optional)

Regular expression with capture groups used to search the PDF text.


regex_match_index (optional)

Regular expression capture group index to render as the value. Default: 0

Index 0 returns the whole matched string. Indexes >= 1 return valid capture groups.


value_template (optional)

Post-regex template rendering of the value.


Full Configuration Example

The PDF in this example contains a line of text reading the following:

Water Consumption Charge     15  x  $  2.2159             33.24 --------------Balance

Three sensors can be used with different regex_match_index capture groups to extract each numeric value:

# Example configuration.yaml entry
homeassistant:
  allowlist_external_dirs:
    - /config

sensor:
  - platform: pdf
    name: Water Usage Volume
    file_path: /config/water-bill.pdf
    unit_of_measurement: m3
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 1

  - platform: pdf
    name: Water Usage Billing Rate
    file_path: /config/water-bill.pdf
    unit_of_measurement: $
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 2

  - platform: pdf
    name: Water Usage Total Cost
    file_path: /config/water-bill.pdf
    unit_of_measurement: $
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 3