dathere / datapusher-plus

A standalone web service that pushes data into the CKAN Datastore fast & reliably. It pushes real good!
GNU Affero General Public License v3.0
27 stars 21 forks source link

Prefer_dmy failing #141

Closed EricSoroos closed 1 month ago

EricSoroos commented 3 months ago

Describe the bug

qsv applydp 0.128.0 doesn't appear to support the --prefer-dmy flag, added here: https://github.com/dathere/datapusher-plus/blob/master/datapusher/jobs.py#L1120

qsv 0.108 seems to work, so I've downgraded there for now.

ckan@96a3a268e3a3:/$ /usr/local/bin/qsvdp applydp datefmt last_reported /tmp/tmp8xb42s2r.csv --output /tmp/tmpr0oz464i.csv --prefer-dmy
Unknown flag: '--prefer-dmy'

Usage:
qsv applydp operations <operations> [options] <column> [<input>]
qsv applydp emptyreplace --replacement=<string> [options] <column> [<input>]
qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --help
ckan@96a3a268e3a3:/$ /usr/local/bin/qsvdp --version
qsvdp 0.128.0-mimalloc-Luau 0.625;polars-0.40.0;self_update-4-4;6.15 GiB-0 B-3.83 GiB-7.68 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.78) prebuilt

To Reproduce

image

Expected behavior Either qsv applydp --prefer-dmy works, or we just set the env variable.

Desktop (please complete the following information): Looks like at least versions 0.16 -> master have this

jqnatividad commented 3 months ago

Thanks for the report @EricSoroos ,

I moved datefmt to its own command and removed it from applydp. I'll update DP+ so it can be used with the latest qsv.

pdelboca commented 2 months ago

Confirming that running Datapusher+ with qsv 0.128 fails with similar error:

2024-07-11 08:29:07,136 INFO  [e08027bd-a6fd-46e2-92e7-edef00a31324] Formatting dates "date,valid_on" to ISO 8601/RFC 3339 format with PREFER_DMY: False...
Invalid arguments.

Usage:
qsv applydp operations <operations> [options] <column> [<input>]
qsv applydp emptyreplace --replacement=<string> [options] <column> [<input>]
qsv applydp dynfmt --formatstr=<string> [options] --new-column=<name> [<input>]
qsv applydp --help
2024-07-11 08:29:07,179 ERROR [ckanext.datapusher_plus.jobs] Datapusher Plus error: Applydp error: Command '['/usr/local/bin/qsvdp', 'applydp', 'datefmt', 'date,valid_on', '/tmp/tmp_3o8nhmz/qsv_safenames.csv', '--output', '/tmp/tmp_3o8nhmz/qsv_applydp.csv']' returned non-zero exit status 2., Traceback (most recent call last):
  File "/code/venv/src/datapusher-plus/ckanext/datapusher_plus/jobs.py", line 1072, in _push_to_datastore
    qsv_applydp = subprocess.run(qsv_applydp_cmd, check=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/local/bin/qsvdp', 'applydp', 'datefmt', 'date,valid_on', '/tmp/tmp_3o8nhmz/qsv_safenames.csv', '--output', '/tmp/tmp_3o8nhmz/qsv_applydp.csv']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/venv/src/datapusher-plus/ckanext/datapusher_plus/jobs.py", line 273, in datapusher_plus_to_datastore
    push_to_datastore(input, job_id)
  File "/code/venv/src/datapusher-plus/ckanext/datapusher_plus/jobs.py", line 317, in push_to_datastore
    return _push_to_datastore(task_id, input, dry_run=dry_run, temp_dir=temp_dir)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/code/venv/src/datapusher-plus/ckanext/datapusher_plus/jobs.py", line 1074, in _push_to_datastore
    raise utils.JobError("Applydp error: {}".format(e))
ckanext.datapusher_plus.utils.JobError: Applydp error: Command '['/usr/local/bin/qsvdp', 'applydp', 'datefmt', 'date,valid_on', '/tmp/tmp_3o8nhmz/qsv_safenames.csv', '--output', '/tmp/tmp_3o8nhmz/qsv_applydp.csv']' returned non-zero exit status 2.

@jqnatividad even when the README.md points to install to qsv==0.108 a few paragraphs later it proposes to update to the latest version. So that might be confusing for users if datapusher+ is not always in sync with latest qsv.

tino097 commented 2 months ago

@pdelboca @jqnatividad maybe we should create some version check and maybe displaying a message that newer QSV version has been released

jqnatividad commented 2 months ago

There is an existing minimal version check on DP+, but I agree that we can make DP+ aware of the maximum qsv version it can support too.

Now that the migration away from the deprecated ckanext-serviceprovider is largely done (thanks @tino097! ), I can start integrating the latest version of qsv for even more expanded analysis and metadata inferencing we've been busy baking in into qsv to fully realize automagical metadata with the DRUF workflow. 😄