Closed amickan closed 1 year ago
That would mean that creating json outputs not stored in the db is generally broken since this PR.
That would mean that creating json outputs not stored in the db is generally broken since this PR.
What does this mean? Jobs were incorrectly rejected? Or that values were stored for files or files for values, etc?
Inspecting the Job failures, only 4 jobs have failed with a validation failure message in the past two months, three of those were for the algorithm that reported this issue, so I think "generally broken" is not true. There were several challenges that used output like this for MICCAI, so if jobs were regularly failing we would have heard about it sooner.
Ok, well that's good news. I'm not sure what's special about this case then. If what I suggest above is true, then you would expect all jobs with json file outputs to fail with a ValidationError, but if that's not the case, my analysis might be wrong.
I spent some more time debugging this, and realized that I was wrong. Even though we create a json output, in the actual create_instance
method, we take the value and dump into a file for interfaces that are not stored in the db:
def create_instance(self, *, image=None, value=None, fileobj=None):
civ = ComponentInterfaceValue.objects.create(interface=self)
if image:
civ.image = image
elif fileobj:
container = File(fileobj)
civ.file.save(Path(self.relative_path).name, container)
elif self.saved_in_object_store:
civ.file = ContentFile(
json.dumps(value).encode("utf-8"),
name=Path(self.relative_path).name,
)
else:
civ.value = value
So saving outputs of json file type is not broken. It should work. But now I'm back at square one with trying to figure out what is causing the issues for Ward's algorithm.
Could you maybe make a test that reproduces it? Even with fixtures if needed?
Good to know that it's not a bigger issue though!
Yeah that's the problem, I'm not able to reproduce it. Ward sent an example output file from his algorithm, and if I mock the part of _create_json_result
that calls the S3 client and reads the file object downloaded from S3 and instead pass the contents of the example file, the CIV gets created just fine, no errors.
So the issue must be somewhere within this part of the code, or with the initial upload of the produced json file to the S3 outputs bucket. Not sure how to test that further though.
with io.BytesIO() as fileobj:
self._s3_client.download_fileobj(
Fileobj=fileobj,
Bucket=settings.COMPONENTS_OUTPUT_BUCKET_NAME,
Key=key,
)
fileobj.seek(0)
I would be astounded if there is something wrong with those two lines.
OK how about a sanity check - put that file in the simplest Algorithm container you can make an upload it, see if you can reproduce it locally, or on grand challenge?
Yes, I can reproduce it. The parse outputs task fails with the following error. It's trying to validate a json that contains both the outputs and the inputs against the multiple points schema - that's weird. I can't go back far enough in time in the Django admin on GC to see if Ward's jobs produced the same error or if this is because I'm outputting the json wrong in my dummy algorithm.
django.core.exceptions.ValidationError: {'all': ["JSON does not fulfill schema: [{'outputs': [{'type': 'Multiple points', 'points': [{'point': [122.19, -27.73, 1840.87]}, {'point': [88.87, 5.86, 1824.93]}], 'version': {'major': 1, 'minor': 0}}], 'inputs': [{'type': 'metaio_image', 'filename': 'f32fc983-087f-4a6b-891a-3c87d4caf745.mha'}], 'error_messages': []}] is not of type 'object'\n\nFailed validating 'type' in schema[0]:\n {'additionalProperties': False,\n 'properties': {'name': {'type': 'string'},\n 'points': {'items': {'allOf': [{'$ref': '#/definitions/point-object'}]},\n 'type': 'array'},\n 'type': {'enum': ['Multiple points']},\n 'version': {'$ref': '#/definitions/version-object'}},\n 'required': ['version', 'type', 'points'],\n 'type': 'object'}\n\nOn instance:\n [{'error_messages': [],\n 'inputs': [{'filename': 'f32fc983-087f-4a6b-891a-3c87d4caf745.mha',\n 'type': 'metaio_image'}],\n 'outputs': [{'points': [{'point': [122.19, -27.73, 1840.87]},\n {'point': [88.87, 5.86, 1824.93]}],\n 'type': 'Multiple points',\n 'version': {'major': 1, 'minor': 0}}]}]"]}
I would double check your algorithm outputs as that doesn’t look right at all. That looks like you’re putting the development results.json in the nodule outputs.
Also you can rerun the job to get the up to date error messages.
Yes, I had made a mistake in writing the outputs. I fixed that and reran the job. I can reproduce the error, it's a schema validation error:
Traceback (most recent call last): File "/app/grandchallenge/components/backends/base.py", line 286, in _create_json_result civ = interface.create_instance(value=result) File "/app/grandchallenge/components/models.py", line 760, in create_instance civ.full_clean() File "/opt/poetry/.venv/lib/python3.8/site-packages/django/db/models/base.py", line 1251, in full_clean raise ValidationError(errors) django.core.exceptions.ValidationError: {'all': ["JSON does not fulfill schema: [{'type': 'Multiple points', 'points': [{'point': [122.19, -27.73, 1840.87]}, {'point': [88.87, 5.86, 1824.93]}], 'version': {'major': 1, 'minor': 0}}] is not of type 'object'\n\nFailed validating 'type' in schema[0]:\n {'additionalProperties': False,\n 'properties': {'name': {'type': 'string'},\n 'points': {'items': {'allOf': [{'$ref': '#/definitions/point-object'}]},\n 'type': 'array'},\n 'type': {'enum': ['Multiple points']},\n 'version': {'$ref': '#/definitions/version-object'}},\n 'required': ['version', 'type', 'points'],\n 'type': 'object'}\n\nOn instance:\n [{'points': [{'point': [122.19, -27.73, 1840.87]},\n {'point': [88.87, 5.86, 1824.93]}],\n 'type': 'Multiple points',\n 'version': {'major': 1, 'minor': 0}}]"]}
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/app/grandchallenge/components/tasks.py", line 825, in parse_job_outputs outputs = executor.get_outputs( File "/app/grandchallenge/components/backends/base.py", line 82, in get_outputs res = self._create_json_result(interface=interface) File "/app/grandchallenge/components/backends/base.py", line 296, in _create_json_result raise ComponentException( grandchallenge.components.backends.exceptions.ComponentException: The output file 'nodule-locations.json' is not valid
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/opt/poetry/.venv/lib/python3.8/site-packages/celery/app/trace.py", line 451, in trace_task R = retval = fun(*args, *kwargs) File "/opt/poetry/.venv/lib/python3.8/site-packages/celery/app/trace.py", line 734, in __protected_call__ return self.run(args, **kwargs) File "/app/grandchallenge/components/tasks.py", line 830, in parse_job_outputs raise PriorStepFailed("Could not parse outputs") grandchallenge.components.exceptions.PriorStepFailed: Could not parse outputs
The json being validated is an array, not an object.
Ward Hendrix reported that his algorithm (Lung nodule detection in routine clinical CT) which uses the
nodule-locations
interface keeps failing with the following error: "The outputfile nodule-locations.json is not valid" This is not a Json decoding issue, the json he is outputting has the correct format. This is a ValidationError arising on CIV creation.I traced back the code, and I think the issue arises here:
For outputs of json type that are not stored in the database, we should create a file result, but instead we're trying to create a json result, which will fail because we try to set a value instead of a file.
So I think the solution is to check if
interface.requires_file
and to move that check before the is_json_kind check: