Hochfrequenz / kohlrahbi

An Anwendungshandbücher (AHB) scraper that extracts tables from docx files
https://pypi.org/project/kohlrahbi/
GNU General Public License v3.0
5 stars 2 forks source link

Only write those expressions to machine-readable AHBs, that are actually valid expressions #444

Open hf-kklein opened 2 months ago

hf-kklein commented 2 months ago
          > warum wir die hier filtern und nicht die Datengrundlage selbst

ja, gerne :) da müsste halt kohlrahbi eine dependency auf ahbicht haben und nur diejenigen expressions schreiben, die auch well-formed sind. gerne ein issue da aufmachen ;) ich wäre mir nur unsicher, ob das out-of-scope ist.

Originally posted by @hf-kklein in https://github.com/Hochfrequenz/ahbicht-functions/issues/294#issuecomment-2345523263

hf-kklein commented 2 months ago
from ahbicht.expressions.ahb_expression_parser import parse_ahb_expression_to_single_requirement_indicator_expressions

@lru_cache(maxsize=2**15)  # there are ~18k expressions over all AHBs as of 2024-09-11; So they all fit in 2**15
def is_wellformed_expression(expression: str) -> bool:
    """
    returns true iff the expression string is parseable by ahbicht
    """
    try:
        _ = parse_ahb_expression_to_single_requirement_indicator_expressions(expression)
        return True
    except (SyntaxError, VisitError):
        return False
hf-krechan commented 2 months ago

We try to ^^ There is a little conflict between AHB Tabellen and Bedinungsbaum/AHahnB.

In AHB Tabellen I would like to see even the wrong conditions.

hf-kklein commented 2 months ago

It's not urgent then. We implemented the above filter on Entscheidungsbaum side already ;)

hf-krechan commented 2 months ago

And we have a real issue with it. Unfortunately there are conditions which are real sentences :D

FV: 2404 Prüfi: 55001

image

hf-krechan commented 2 months ago

ggf. braucht der ahbicht bald auch ein LLM um diese Texte interpretieren zu können :D

hf-kklein commented 2 months ago

nice. einmal mit profis arbeiten.🙃