Nicer model hashes; `model_hash_to_model` method

dilpath commented 5 months ago

Model hashes are now {MODEL_SUBSPACE_ID}-{MODEL_SUBSPACE_INDICES}-{PETAB_HASH}.

This makes possible a petab_select_problem.model_hash_to_model method, that converts a model hash into a model.

For parameters with > 10 options, alphabet characters will be used (similar to hex, but now 62 characters from [0-9][A-Z][a-z]. For parameters with > 62 options, the actual index delimited by . will be used. e.g. [estimate, 0, estimate, 0, 0] (values) -> [1,0,1,0,0] (indices) -> model_subspace_A-10100-petab_hash (hash) [1, 36, 0] (indices) -> model_subspace_A-1Z0-petab_hash (hash) [1, 63, 0] (indices) -> model_subspace_A-1.63.0-petab_hash (hash)

Also possible to just go with the last option with .s for simpler implementation and broad applicability. However, I expect most users will have parameters that can only be one of (0, estimate}, in which case all indices will be one of {0, 1}, so the more compact first representation is probably preferred.

The PEtab hash is computed from only the following information:

absolute location of the PEtab problem YAML file in the filesystem
nominal values of parameters in the model's PEtab problem
estimated parameters in the model's PEtab problem

dweindl commented 5 months ago

Also possible to just go with the last option with .s for simpler implementation and broad applicability. However, I expect most users will have parameters that can only be one of (0, estimate}, in which case all indices will be one of {0, 1}, so the more compact first representation is probably preferred.

No strong opinion. Let's see it in action and decide then.

dilpath commented 5 months ago

But wait, is it really model_subspace_A-1.63.0? I think it produces model_subspace_A.1-63-0.

:see_no_evil: Thanks! I doubt this case in the current implementation will ever be used, I've never heard of anyone using more than maybe 3 or so options for a parameter. Not sure whether PEtab Select would even handle this case well.

I am not sufficiently familiar with the matter to tell whether there can be any collisions or not.

It should be that every model subspace ID is unique in the context of a single PEtab Select problem, which also means that every model subspace ID exists in only one row of one model space file, of possibly many model space files. So, a model_subspace_id with its model subspace indices should correspond to one model in the full model space, at most.

It could be that multiple model subspace ID + indices combinations correspond to the same "model" (same mathematical model structure and estimated parameters), due to e.g. model subspace rows in the model space file that match in everything except their unique model subspace ID. This would mean that the "same" model that exists in different subspaces could be calibrated multiple times (once per duplicate) during model selection. But, this would always be under a unique hash, so no collisions at least.

No strong opinion. Let's see it in action and decide then.

Sounds good!

dweindl commented 5 months ago

It should be that every model subspace ID is unique in the context of a single PEtab Select problem

Thanks for the explanation. I think it would be good to stress in the docs that those hashes are context-dependent (which is usually not the case for hashes). I.e., after modifying a given PEtab select problem, a certain hash may identify a different model. This shouldn't really be a practical limitation, but users need to be aware.

PEtab-dev / petab_select

Nicer model hashes; `model_hash_to_model` method #86