dmulyalin / ttp

Template Text Parser
MIT License
350 stars 34 forks source link

Limit parsing of groups/tables with similar structure to separate logical parts #80

Closed PeterSR closed 2 years ago

PeterSR commented 2 years ago

Hi! Thank you for this awesome tool!

I have stumbled upon some data that I have trouble parsing correctly and I haven't been able to gather from the docs what the proper approach is.

Here is some sample input:

### SECTION 1

parameter1: 23 [kg]

table:

  id      h[m]    v[m/s]    x[m]    y[m]
    0     130      50.00      3      5
  100     140     -20.00      1      3

### SECTION 2

parameter2: 50 [N]

table:

  id      A     B    C    D
   10     0.01  -13  t    52
  251     1.02  7    f    7387

### END

Each section contains some completely logically separate data, i.e. various parameters with different names, various tables that describe different things. Unfortunately, the tables have roughly the same structure, i.e. the same number of columns. I tried to follow the alternative approach here: https://github.com/dmulyalin/ttp/issues/76#issuecomment-1162935676

Here is the current template:

<vars>
HASH3 = "\#\#\#"
</vars>

<group name="section1">
{{ ignore("HASH3") }} SECTION 1 {{ _start_ }}

<group name="parameters">
parameter1: {{ parameter1 }} [kg]
</group>

<group name="table">
{{ ignore(" *") }}{{id}} {{h}} {{v}} {{x}} {{y}}
</group>

{{ ignore("HASH3") }} SECTION 2 {{ _end_ }}
</group>

<group name="section2">
{{ ignore("HASH3") }} SECTION 2 {{ _start_ }}

<group name="parameters">
parameter2: {{ parameter2 }} [N]
</group>

<group name="table">
{{ ignore(" *") }}{{id}} {{A}} {{B}} {{C}} {{D}}
</group>

{{ ignore("HASH3") }} END {{ _end_ }}
</group>

I have tried to use nested groups with starts and ends to isolate the two parts.

And this is the output:

[[{'section1': [{'parameters': {'parameter1': '23'},
                 'table': [{'h': 'h', 'id': 'id', 'v': 'v', 'x': 'x', 'y': 'y'},
                           {'h': '130',
                            'id': '0',
                            'v': '50',
                            'x': '3',
                            'y': '5'},
                           {'h': '140',
                            'id': '100',
                            'v': '-20',
                            'x': '1',
                            'y': '3'}]},
                {'table': [{'h': 'A', 'id': 'id', 'v': 'B', 'x': 'C', 'y': 'D'},
                           {'h': '0.01',
                            'id': '10',
                            'v': '-13',
                            'x': 't',
                            'y': '52'},
                           {'h': '1.02',
                            'id': '251',
                            'v': '7',
                            'x': 'f',
                            'y': '7387'}]}],
   'section2': [{'parameters': {'parameter2': '23'},
                 'table': [{'A': 'h', 'B': 'v', 'C': 'x', 'D': 'y', 'id': 'id'},
                           {'A': '130',
                            'B': '50',
                            'C': '3',
                            'D': '5',
                            'id': '0'},
                           {'A': '140',
                            'B': '-20',
                            'C': '1',
                            'D': '3',
                            'id': '100'}]},
                {'table': [{'A': 'A', 'B': 'B', 'C': 'C', 'D': 'D', 'id': 'id'},
                           {'A': '0.01',
                            'B': '-13',
                            'C': 't',
                            'D': '52',
                            'id': '10'},
                           {'A': '1.02',
                            'B': '7',
                            'C': 'f',
                            'D': '7387',
                            'id': '251'}]}]}]]

As we can see, both tables are included in both section groups. In my mind since I have added a "start" and "end" to both outer groups, the inner groups should also start and stop parsing by those markers. This however does not seem to be the case.

Do you have any ideas for how to properly parse something like this?


Using python 3.8, ttp 0.9.1.

dmulyalin commented 2 years ago

Hello, this template:

<vars>
HASH3 = "\#\#\#"
</vars>

<group name="sections">

<group name="section1">
{{ ignore("HASH3") }} SECTION 1 {{ _start_ | _exact_ }}

<group name="parameters" method="table">
parameter1: {{ parameter1 }} [kg]
</group>

<group name="table">
{{ ignore(" +") }}{{id | DIGIT}} {{h}} {{v}} {{x}} {{y}}
</group>

</group>

<group name="section2">
{{ ignore("HASH3") }} SECTION 2 {{ _start_ | _exact_ }}

<group name="parameters" method="table">
parameter2: {{ parameter2 }} [N]
</group>

<group name="table">
{{ ignore(" +") }}{{id | DIGIT}} {{h}} {{v}} {{x}} {{y}}
</group>

</group>

</group>

Gives results you need:

[
    {
        "sections": {
            "section1": {
                "parameters": {
                    "parameter1": "23"
                },
                "table": [
                    {
                        "h": "130",
                        "id": "0",
                        "v": "50.00",
                        "x": "3",
                        "y": "5"
                    },
                    {
                        "h": "140",
                        "id": "100",
                        "v": "-20.00",
                        "x": "1",
                        "y": "3"
                    }
                ]
            },
            "section2": {
                "parameters": {
                    "parameter2": "50"
                },
                "table": [
                    {
                        "h": "0.01",
                        "id": "10",
                        "v": "-13",
                        "x": "t",
                        "y": "52"
                    },
                    {
                        "h": "1.02",
                        "id": "251",
                        "v": "7",
                        "x": "f",
                        "y": "7387"
                    }
                ]
            }
        }
    }
]
dmulyalin commented 2 years ago

Alternatively, if sections have common id in them this template:

<vars>
HASH3 = "\#\#\#"
</vars>

<group name="section{{ sec_id }}">
{{ ignore("HASH3") }} SECTION {{ sec_id }}

<group name="parameters" method="table">
parameter1: {{ parameter1 }} [kg]
parameter2: {{ parameter2 }} [N]
</group>

<group name="table">
{{ ignore(" +") }}{{id | DIGIT}} {{h}} {{v}} {{x}} {{y}}
</group>

</group>

gives same results.

PeterSR commented 2 years ago

Hi! Sorry for the late reply and thank you so much for the templates!

I totally forgot that digits in the match like "SECTION 1" and "SECTION 2" will be replaced with a match for any digit unless using _exact_. Thanks!

I will try to apply this to my full template and close this for now.

PeterSR commented 2 years ago

The problem I tried to investigate in this issue is the same as in https://github.com/dmulyalin/ttp/issues/82, but for the issue in this thread I might have reduced the input and template too much thus not yielding the bug I see in issue 82.