DARMA-tasking / LB-analysis-framework

Analysis framework for exploring, testing, and comparing load balancing strategies
Other
5 stars 1 forks source link

#537: Fix broken CI #538

Closed cwschilly closed 2 months ago

cwschilly commented 2 months ago

Fixes #537

cwschilly commented 2 months ago

The CI is fixed now but I'm still investigating why all tests passed on PR #528 . I've rerun the pipelines on that branch and they're failing now.

@tlamonthezie Do you have any idea what would have made the tests pass originally?

tlamonthezie commented 2 months ago

The CI is fixed now but I'm still investigating why all tests passed on PR #528 . I've rerun the pipelines on that branch and they're failing now.

@tlamonthezie Do you have any idea what would have made the tests pass originally?

This is probably because the schema changed in vt in LBDatafile_schema.py with the recent modification of the validate_ids constraint. (modification done after the current PR) In lbaf tests we tests some invalid schemas but the expected first error was no more thrown because new error was thrown first in the wrong schema datasets (because validate_ids fail and there are then 2 errors in some test schemas). Any update in the vt schema in fact might make lbaf schema tests fail even if it previously succeeded. In this particular case resolving the data files errors thrown by validate_ids should make ci passing again because the remaining error will be the expected one again.

tlamonthezie commented 2 months ago

Hello @cwschilly it seems that the bug is in the VT schema validation process. Schema is defined at https://github.com/DARMA-tasking/vt/blob/develop/scripts/LBDatafile_schema.py

The recently added condidion if field['migratable'] and 'seq_id' in field and 'collection_id' not in field: is definitely the error source at vt #2278) The added if statement is not testing if the optional migratable key is present although the schema defines it as optional. So if key is not present (in a communication from or to node) we get (and we will get) a failure of type KeyError('migratable') raised when validate_ids is called. To fix it we should fix VT schema by replacing if field['migratable'] by if field.get('migratable') is True in the condition (the get method will return None if key is not present and we won't get the key error)

LFAF unit tests have passed until the introduction of this condition in vt schema because it was passing with this schema version: https://github.com/DARMA-tasking/vt/blob/2342-change-schema-to-allow-for-id-or-seq_id/scripts/LBDatafile_schema.py

@cwschilly even if the current PR make tests passing again I suggest to fix the initial cause of the error and then you can open an issue in VT to make a quickfix. Tests pass now - because you added migratable: true everywhere it was missing - but if we have communications without migratable keys (non-migratable tasks) we might encouter that error again in the future unless we fix the schema validate_ids function.