comet-ml / kangas

🦘 Explore multimedia datasets at scale
https://github.com/comet-ml/kangas/wiki
Apache License 2.0
1.04k stars 46 forks source link

Metadata collection #122

Closed sanderfoobar closed 1 year ago

sanderfoobar commented 1 year ago

Kangas, by default on each startup, sends metadata to Comet:

Basically metadata collection concealed as version check. A real version check would e.g use the Github API to fetch the latest tag.

https://github.com/comet-ml/kangas/blob/df0c1a495032cc4f1c367c74fcb0ef6e5a2063be/backend/kangas/utils.py#L124-L149

dsblank commented 1 year ago

@sanderfoobar would you be willing to make a PR to optionally allow to use github API for version check? We can use the environment variable "KANGAS_VERSION_CHECK" if the value is something like "GITHUB_API".

BTW, this isn't meant to be concealed but it is meant to do two things: log the details, and check for version update. Logging the details does keep the user anonymous, but allows us to know what is working and what isn't (eg usage). Feel free to make a PR for code or docs to make this more clear.

sanderfoobar commented 1 year ago

I would assume folk do not like that some computer information gets transferred, by default, somewhere without opt-in (e.g Steam's yearly hardware & software survey is opt-in). Currently such a mechanism allows Comet to identify possible sales leads (e.g: IP block owned by company X is using Kangas) which is probably one of the reasons why it is there. Note that in the EU an IP address counts as 'personal identifiable information' and is subject to privacy laws, so if Comet saves this metadata in a database then people will be able to do a GDPR request.

I created this issue for exposure, for future users who stumble upon this, etc.