geopython / stetl

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
https://www.stetl.org
GNU General Public License v3.0
85 stars 35 forks source link

Format string and dictionary options are incompatible #124

Open sebastic opened 3 years ago

sebastic commented 3 years ago

Describe the bug stetl fails when using a configuration option with a dictionary value and an arguments dictionary.

Example config from: https://github.com/geopython/stetl/blob/master/examples/basics/11_formatconvert/etl.cfg#L46

# The GML must be a simple features collection
[convert_to_geojson]
class = stetl.filters.formatconverter.FormatConverter
input_format = etree_doc
output_format = geojson_collection
converter_args = {
    'root_tag': 'FeatureCollection',
    'feature_tag': 'featureMember',
    'feature_id_attr': 'fid'
    }

To Reproduce

$ PYTHONPATH=. python3 bin/stetl -c examples/basics/11_formatconvert/etl.cfg -a foo=bar
2021-11-29 14:49:25,134 util INFO Found lxml.etree, native XML parsing, fabulous!
2021-11-29 14:49:25,188 util INFO Found GDAL/OGR Python bindings, super!!
2021-11-29 14:49:25,190 main INFO Stetl version = 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO INIT - Stetl version is 2.1.dev0
2021-11-29 14:49:25,191 ETL INFO Config/working dir = /home/bas/git/nlextract/nlextract/externals/stetl/examples/basics/11_formatconvert
2021-11-29 14:49:25,191 ETL INFO Reading config_file = examples/basics/11_formatconvert/etl.cfg
2021-11-29 14:49:25,191 ETL INFO Substituting 0 args in config file from args_dict: []
2021-11-29 14:49:25,191 ETL ERROR Error substituting config arguments: err="\n    'root_tag'"
Traceback (most recent call last):
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 43, in <module>
    main()
  File "/home/bas/git/nlextract/nlextract/externals/stetl/bin/stetl", line 35, in main
    etl = ETL(vars(args), args.config_args)
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 97, in __init__
    raise e
  File "/home/bas/git/nlextract/nlextract/externals/stetl/stetl/etl.py", line 91, in __init__
    config_str = config_str.format(**args_dict)
KeyError: "\n    'root_tag'"

Expected Behavior The configuration is loaded successfully, including argument substitution.

Context (please complete one or more from the following information):

Additional context A string2record converter was implemented:

--- a/stetl/filters/formatconverter.py
+++ b/stetl/filters/formatconverter.py
@@ -338,6 +338,29 @@ class FormatConverter(Filter):
         packet.data = etree.fromstring(packet.data)
         return packet

+    @staticmethod
+    def string2record(packet, converter_args=None):
+        if(
+            converter_args is not None and
+            'value_column' in converter_args
+        ):
+            key = converter_args['value_column']
+        else:
+            key = 'value'
+
+        record = dict({key: packet.data})
+
+        if(
+            converter_args is not None and
+            'column_data' in converter_args
+        ):
+            for key in converter_args['column_data']:
+                record[key] = converter_args['column_data'][key]
+
+        packet.data = record
+
+        return packet
+
     @staticmethod
     def struct2string(packet):
         packet.data = packet.to_string()
@@ -406,6 +429,7 @@ FORMAT_CONVERTERS = {
     },
     FORMAT.string: {
         FORMAT.etree_doc: FormatConverter.string2etree_doc,
+        FORMAT.record: FormatConverter.string2record,
         FORMAT.xml_doc_as_string: FormatConverter.no_op
     },
     FORMAT.struct: {

Which requires configuration like this:

# convert string to record
[convert_string_to_record]
class = stetl.filters.formatconverter.FormatConverter
input_format = string
output_format = record
converter_args = {
        'value_column': 'waarde',
        'column_data': {
            'sleutel': 'levering_xml',
        },
    }

Due to this issue the converters which require converter_args cannot be used in the NLExtract BAGv2 configuration because that sets arguments via options/<hostname>.args.

sebastic commented 3 years ago

ast.literal_eval() does not support the alternative dict() syntax:

>>> {}
{}
>>> dict()
{}
>>> ast.literal_eval('{}')
{}
>>> ast.literal_eval('dict()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/ast.py", line 105, in literal_eval
    return _convert(node_or_string)
  File "/usr/lib/python3.9/ast.py", line 104, in _convert
    return _convert_signed_num(node)
  File "/usr/lib/python3.9/ast.py", line 78, in _convert_signed_num
    return _convert_num(node)
  File "/usr/lib/python3.9/ast.py", line 69, in _convert_num
    _raise_malformed_node(node)
  File "/usr/lib/python3.9/ast.py", line 66, in _raise_malformed_node
    raise ValueError(f'malformed node or string: {node!r}')
ValueError: malformed node or string: <ast.Call object at 0x7fad4e8fd460>

Supporting both substitution variables and dictionary values may require changing the config file into a Jinja template.