Model versioning and duplicate prevention

citysciencelab / urban-model-platform

This repository contains a prototype for a urban model platform. It is written in Python and implements the OGC API Processes Standard as an execution management service.

GNU General Public License v3.0

10 stars 1 forks source link

Model versioning and duplicate prevention #20

Open herzogrh opened 3 months ago

herzogrh commented 3 months ago

By logging the version of the model in execution requests and the hashing of input parameters, duplicate execution requests should be noticed. Whenever a second request with the same input parameters is send, the job results should be mirrored and pointed to the previous request.

herzogrh commented 3 months ago

Some models may be non-deterministic, so maybe it'd be a good idea to configure in the providers.yaml that results should not be returned from the cache, but instead they should always be calculated

hwbllmnn commented 1 month ago

The problem here is that the parameters are stored in a JSONB field in the database. Computing the hash database wise apparently results in different hashes all the time, presumably because there is no strict order when serializing JSON objects. I've tried a few things (casting input strings to JSONB and then back to text in order to enforce some kind of standard serialization for example) but that didn't help.

An alternative could be to do the hashing python wise in a standard manner, for example by sorting the keys first but that would be more work than to do that db wise.

herzogrh commented 1 month ago

Maybe it's even easier to compare the input parameters directly and not the hash? So if the parameters are the same and the model is configured as deterministic, then the previous job results will be returned.