Open fsteggink opened 4 years ago
More info: the replaced tildes already occur in the options_dict and args_dict which are passed to the ETL constructor. They're coming from the parse_args function in stetl.main.
I suspect this piece of code is responsible for this error:
if args.config_args:
args_total = dict()
for arg in args.config_args:
if os.path.isfile(arg):
log.info('Found args file at: %s' % arg)
args_total = Util.merge_two_dicts(args_total, Util.propsfile_to_dict(arg))
else:
# Convert string to dict: http://stackoverflow.com/a/1248990
args_total = Util.merge_two_dicts(args_total, Util.string_to_dict(arg))
args.config_args = args_total
Note that in the else
case, Util.string_to_dict is called, which is also called in the constructor of the StringSubstitutionFilter (see stack trace above). The solution is to make sure that Util.string_to_dict is only called once. Preferably this should be done when the StringSubstitutionFilter, since it needs to create a dictionary from the format_args string. On the other hand, I'm also using it to inject arguments to placeholders in my config in other places, but maybe spaces instead of tildes can be safely used there.
For now, I'll use another character as a workaround in my StringSubstitutionFilter, since it is not really a problem in my case.
At least the tilde and separator are assigned/used as defaults in util.py
string_to_dict()
:
@staticmethod
def string_to_dict(s, separator='=', space='~'):
# Convert string to dict: http://stackoverflow.com/a/1248990
dict_arr = [x.split(separator) for x in s.split()]
for x in dict_arr:
x[1] = x[1].replace(space, ' ')
return dict(dict_arr)
Think this was introduced in one of the first versions to support long strings for ogr2ogr
exec
.
In hinsight the .ini
file format for Stetl config is not ideal. These days json, yaml, toml and the like are more standard, and are more lenient to strings and even whole texts (especially yaml).
For now: maybe there is a way to change the defaults =
and ~
for string_to_dict
. Environment var? But is not so transparent...
This also happens when trying to import bagv2:
$ ./bagv2/etl/etl.sh -v -a ./bagv2/etl/options/hostname.args
INFO: 21-11-18 10:44:52 - Using options_file=options/hostname.args and user_args=-c conf/etl-imbag-2.1.0.cfg -v -a ./bagv2/etl/options/hostname.args
2021-11-18 10:44:52,461 util INFO Found lxml.etree, native XML parsing, fabulous!
2021-11-18 10:44:52,566 util INFO Found GDAL/OGR Python bindings, super!!
2021-11-18 10:44:52,571 main INFO Stetl version = 2.1.dev0
2021-11-18 10:44:52,573 main INFO Found args file at: /home/bas/software/nlextract/git/bagv2/etl/options/common.args
2021-11-18 10:44:52,573 main INFO Found args file at: options/hostname.args
Traceback (most recent call last):
File "/home/bas/software/nlextract/git/externals/stetl/bin/stetl", line 43, in <module>
main()
File "/home/bas/software/nlextract/git/externals/stetl/bin/stetl", line 27, in main
args = parse_args(sys.argv[1:])
File "/home/bas/software/nlextract/git/externals/stetl/stetl/main.py", line 55, in parse_args
args_total = Util.merge_two_dicts(args_total, Util.string_to_dict(arg))
File "/home/bas/software/nlextract/git/externals/stetl/stetl/util.py", line 112, in string_to_dict
x[1] = x[1].replace(space, ' ')
IndexError: list index out of range
I'm getting an IndexError: list index out of range exception when creating a StringSubstitutionFilter. Stack trace:
This happens when the config file contains placeholders which are passed through the command line and when the value contains spaces which are represented with tildes. Example:
stetl -c blah.cfg -a myvalue=contains~space
Previously, as a workaround, I passed those values as environment variables, like
export STETL_myvalue=contains~space
. Then this error doesn't occur.As you can see, this occurs in Stetl version 2.1-dev, but this also happened before the 2.0 versions.
During debugging, I found out that after I create the ETL object (
etl = ETL(vars(args_parsed), args_parsed.config_args)
) and show the config_dict, the relevant section is shown like this:It is clear that the tilde is replaced by a space earlier in the process, at the creation of the ETL object.
So, when this is passed to string_to_dict, the dict_arr will look like this:
[['myvalue','contains'],['space']]
, which obviously causes the IndexError, since the second array only contains one element.I haven't looked yet where this error exactly occurs. This must happen after extra arguments are passed through -a, but not when arguments are passed as environment variables with the 'STETL_'-prefix.