Allowing for NOTAM body with field delimiters, like `C)`

eagleDiego commented 2 years ago

I came across a NOTAM that was throwing an error:

A3475/22 NOTAMN
Q) LIMM/QFAXX/IV/NBO/A/000/999/4537N00843E005
A) LIMC B) 2205182200 C) PERM
E) REF AIP AD 2 LIMC 1-12 ITEM 20 'LOCAL TRAFFIC REGULATIONS'
BOX 2 'APRON' PARAGRAPH 2.1 'ORDERLY MOVEMENT OF AIRCRAFT ON
APRONS' INDENT 4 'SERVICES PROVIDED' POINT C) 'FOLLOW-ME ASSISTANCE
PROVIDED ON PILOT'S REQUEST AND MANDATORY IN CASE' ADD THE FOLLOWING
IN CASE:
- GENERAL AVIATION AIRCRAFT UP TO ICAO CODE B (MAXIMUM WINGSPAN 24
METERS) AND HELICOPTERS ARRIVING AND DEPARTING FROM STANDS 301 TO 320
AND FROM 330 TO 336.
ARR TAXI ROUTE: AFTER TWR INSTRUCTIONS VIA APN TAXIWAY P-K TO
INTERMEDIATE HOLDING POSITION (IHP) K9 WHERE FOLLOW-ME CAR WILL
BE WAITING.
DEP TAXI ROUTE: AFTER TWR INSTRUCTIONS AND WITH FOLLOW-ME
ASSISTANCE VIA APN TAXIWAY N-K TO IHP K8

The body of the NOTAM (under tag E)) contains text that is the same as a tag, the C) in POINT C) 'FOLLOW-ME. avwx/current/notam.py doesn't allow for this to happen.

I solved the error by checking that the tag's content is being set for the first time (in theory, arbitrary text should only appear after the header, so this should be relatively safe).

My proposed edit below:

        if tag == "Q":
            if qualifiers is None:
                qualifiers = _qualifiers(item, units)
        elif tag == "A":
            if station is None:
                station = item
        elif tag == "B":
            if start_text == "":
                start_text = item
        elif tag == "C":
            if end_text == "":
                end_text = item
        elif tag == "D":
            if schedule is None:
                schedule = item
        elif tag == "E":
            if body == "":
                body = item
        elif tag == "F":
            if lower is None:
                lower = core.make_altitude(item.split()[0], units, repr=item)[0]
        elif tag == "G":
            if upper is None:
                upper = core.make_altitude(item.split()[0], units, repr=item)[0]

However, one issue remains because the original string is being sliced every time the tags are found, the NOTAM body is cut just before the C) in the text.

body='REF AIP AD 2 LIMC 1-12 ITEM 20 &apos;LOCAL TRAFFIC REGULATIONS&apos;\nBOX 2 &apos;APRON&apos; PARAGRAPH 2.1 &apos;ORDERLY MOVEMENT OF AIRCRAFT ON\nAPRONS&apos; INDENT 4 &apos;SERVICES PROVIDED&apos; POINT'

I'm no RegEx wizard, so there might be a more succinct way of solving this problem by altering the RegEx that matches the tags.

devdupont commented 2 years ago

Ok I see the issue. One possible solution would be to exclude prior tag matches from the regex so "C) " wouldn't match twice. I think the fastest way would be to create additional compiled regexes based on the most recent tag letter. The letters follow a predictable order, so it shouldn't create too much overhead.

KEY_PATTERNS = {
    None: re.compile(r"[A-GQ]\) "),
    "Q": re.compile(r"[A-G]\) "),
    "A": re.compile(r"[B-G]\) "),
    ...
}

match = KEY_PATTERNS[tag].search(text)

devdupont commented 2 years ago

I've implemented the fix above as well as adding a boundary check to the regex to make it more precise. In addition to fixing the report above, it also fixed bodies containing "(OBSTACLE)" that was previously being cut off in the test suite. Version 1.8.3 will contain the fix and go out soon.

avwx-rest / avwx-engine

Allowing for NOTAM body with field delimiters, like `C)` #37