Closed axel7083 closed 2 months ago
why not sending the hash of the model id ?
Imported models do not have a hash, we would have to compute it on import, that feasible, but would require some work on the import side, and could lead to potential issues: what if the user import manually by editing the file, do we compute at startup ?
@axel7083 hash of the id
@axel7083 hash of the id
I mean, are we interested in this information for imported user ? It would be different for each imported path, I feel like having a fixed value for imported models (<imported>
) would ease the filtering on the telemetry, as we don't have tons of hashes without meaning and can simply filter out this information.
@axel7083 hash of the id
I mean, are we interested in this information for imported user ? It would be different for each imported path, I feel like having a fixed value for imported models (
<imported>
) would ease the filtering on the telemetry, as we don't have tons of hashes without meaning and can simply filter out this information.
Yes but having a distinct value may help to detect patterns in telemetry
can we store the hash in the model object so we only compute it once ?
Here I replicated what we do for the PullImage
telemetry on Podman Desktop, which is hashing the id. We have the sha256 for the model we put in the catalog, but not the one the user imported.
There a several method to import a model, through the UI or through the user-catalog editing. Therefore computing the hash of the model object (not the id) would require deeper change.
IMO we should not hash and use the imported models for telemetry, as we could deduce easily the models that users are using, which seems to be personal/private information ?
personal/private information is if you're able to identify the user (like if the name of the model is the name of the user)
if you're able to see that people are using models being available on huggingface (through the hash), it's not because you can detect that someone is using that model that you know who is the person.
if you're able to see that people are using models being available on huggingface (through the hash), it's not because you can detect that someone is using that model that you know who is the person.
Okey, yeah I agree.
But hashing the model would need a deeper change here, which would need it own issue, next sprint.
What does this PR do?
Prevent leakage of user information in the telemetry
Screenshot / video of UI
N/A
What issues does this PR fix or reference?
Fixes https://github.com/containers/podman-desktop-internal/issues/325
How to test this PR?