Ekumen-OS / lambkin

Apache License 2.0
11 stars 0 forks source link

The benchmark cache is not invalidated when data changes #95

Open glpuga opened 2 months ago

glpuga commented 2 months ago

Bug description

After a failed and resumed run, the report generation totally ignored one of the five datasets and generated no data for it.

The issue was due to the benchmark cache having become stale, and not getting updated when, on the resumed run, the last dataset output data was added after having been absent when the first run failed.

The issue was fixed by removing the cache file after confirming in the code that it would be regenerated.

How to reproduce

I guess that anything that fills the cache and then changes data will do

Expected behavior

The cache should have some clear invalidation criteria.

The best one for this use case is probably the complete invalidation of the cache contents on each run of the command.

Actual behavior

The cache keeps state across runs of shepherd, causing stale data to be used instead of actual state.

Additional context

.

hidmic commented 1 week ago

Indeed. DataFrame caching is not paying attention to the data source because it doesn't know about it. It should at least pick up what the target iterations are for the underlying function and perform a timestamp check.