Open herzogrh opened 3 months ago
Some models may be non-deterministic, so maybe it'd be a good idea to configure in the providers.yaml that results should not be returned from the cache, but instead they should always be calculated
The problem here is that the parameters are stored in a JSONB field in the database. Computing the hash database wise apparently results in different hashes all the time, presumably because there is no strict order when serializing JSON objects. I've tried a few things (casting input strings to JSONB and then back to text in order to enforce some kind of standard serialization for example) but that didn't help.
An alternative could be to do the hashing python wise in a standard manner, for example by sorting the keys first but that would be more work than to do that db wise.
Maybe it's even easier to compare the input parameters directly and not the hash? So if the parameters are the same and the model is configured as deterministic, then the previous job results will be returned.
By logging the version of the model in execution requests and the hashing of input parameters, duplicate execution requests should be noticed. Whenever a second request with the same input parameters is send, the job results should be mirrored and pointed to the previous request.