clamsproject / wfe-pipeline-runner

3 stars 1 forks source link

Catch invalid properties #23

Open marcverhagen opened 2 years ago

marcverhagen commented 2 years ago

The documentation has an example for adde parameters:

--params tokenizer-eol=True,spacy-linking=True

However, the new clams-python does not allow undefined parameters:

Traceback (most recent call last):
  File "run_pipeline.py", line 399, in <module>
    args.verbose, args.intermediate, args.abort)
  File "run_pipeline.py", line 174, in run
    self.run_on_file(in_path, out_path)
  File "run_pipeline.py", line 216, in run_on_file
    mmif = mmif_with_error_view(mmif_in, service, error_message)
  File "run_pipeline.py", line 333, in mmif_with_error_view
    mmif_out = Mmif(base_mmif)
  File "/Applications/ADDED/python/env/clams/pipeline-runner/lib/python3.7/site-packages/mmif/serialize/mmif.py", line 41, in __init__
    self.validate(mmif_obj)
  File "/Applications/ADDED/python/env/clams/pipeline-runner/lib/python3.7/site-packages/mmif/serialize/mmif.py", line 73, in validate
    jsonschema.validators.validate(json_str, schema)
  File "/Applications/ADDED/python/env/clams/pipeline-runner/lib/python3.7/site-packages/jsonschema/validators.py", line 934, in validate
    raise error
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('message' was unexpected)

It is a bit puzzling to me why we get this message. But in general, the pipeline code should check for invalid parameters using the app metadata.

keighrim commented 2 years ago

I don't think this has anything to do with parameters. All app parameter holding is occurring in clams-python but the validation error is coming from Mmif object, which is technically a part of mmif-python package.

Is base_mmif object a proper MMIF string? Looks like there was an error during the app's processing and the result of the errored process is not a well-formed MMIF.

marcverhagen commented 2 years ago

Yeah, it does look like it is not the parameter handling. But this does get triggered by a parameter though. If I start a container and do curl http://0.0.0.0:5000?pretty all is fine. But if I do curl http://0.0.0.0:5000?prettyx I get

{"message": "Internal Server Error"}

Will spend some quality time tracking that down.

marcverhagen commented 2 years ago

I meant "Yeah, it does look like it is not the parameter handling that is directly responsible for the error."

keighrim commented 2 years ago

Oh I see. It's surely an issue with parameter handling then. I'll take a look at the SDK code and see what we can do with invalid/undefined parameters.


[EDIT] It turned out that the SDK does not handle the parameters at all (current code), so it has to be an app in the pipeline that raise some exception when given with an undefined parameter.

marcverhagen commented 2 years ago

By the way, this is new behavior, I know that in the past I have been able to send in any parameters I want, even if the app did not define it. But I will test and see which applications actually have this issue.

keighrim commented 2 years ago

Further investigation revealed that the SDK is actually "rejecting" undefined parameters (see https://github.com/clamsproject/clams-python/issues/101). But the error message from the rejection is something different from what's reported in this issue.

keighrim commented 2 years ago

I tried two example configurations

py start_pipeline.py config/tokenizer-spacy.yml
py start_pipeline.py config/tokenizer-spacy-params.yml

with an example mmif file with parameters above.

py run_pipeline.py examples/mmif/tokenizer-spacy-2.json  out.mmif --params tokenizer-eol=True,spacy-linking=True

But in both cases, I didn't get the above python exception (ill-formed MMIF error).

marcverhagen commented 2 years ago

Yeah, I got the same. But then I noted that the spaCy app is still stuck on clams-python==0.5.0.

marcverhagen commented 2 years ago

More interestingly, I started two versions of app-nlp-example:

$ docker build -t clams-nlp-example:0.0.7 .
$ docker build -t clams-nlp-example:0.0.7-dev -f Dockerfile.dev .
$ docker run --name example1 --rm -d -p 5000:5000 -v $PWD/input/data:/data clams-nlp-example:0.0.7
$ docker run --name example2 --rm -d -p 5005:5000 -v $PWD/input/data:/data clams-nlp-example:0.0.7-dev

Behold the difference between the Gunicorn and Flask versions:

$ curl http://0.0.0.0:5000?prettyx
{"message": "Internal Server Error"}
$ curl http://0.0.0.0:5005?prettyx
...
...
  File "/usr/local/lib/python3.6/site-packages/clams/restify/__init__.py", line 180, in cast
    raise KeyError(f"An unknown configuration is passed as a parameter: {k} = {v}")
KeyError: 'An unknown configuration is passed as a parameter: prettyx = '

-->
keighrim commented 2 years ago

Okay, so the short (and not-so-helpful) error message issue might be coming from flask built-in server, while gunicorn can handle and successfully re-throw the python exception (KeyError), I guess? The check on invalid parameter was surely added in 0.5.1, and I can revert the change to release 0.5.2 (with updated python in docker images) quite soon, if that's what we want.

keighrim commented 1 year ago

To recap, the clams-python SDK once added code that rejected undefined runtime parameters, treating them as invalid. (https://github.com/clamsproject/clams-python/commit/f18de21d3ce2dd896aab34843bc04f3fabcfb4d6 , included in 0.5.1 release) Recent PR (https://github.com/clamsproject/clams-python/pull/106) reverted the rejection code and instead implemented issuing of runtime warnings when undefined parameters are passed. This PR will be included in the next release (not included in 0.5.2).

Closing the issue as addressed.

keighrim commented 1 year ago

I shouldn't have closed this issue because the runner codebase isn't yet updated to use the updated clams-python SDK code. Re-opening this.