cargo-bins / cargo-binstall

Binary installation for rust projects
GNU General Public License v3.0
1.62k stars 60 forks source link

Clearly disclose in the docs telemetry collection being enabled by default #1884

Closed AlexTMjugador closed 2 months ago

AlexTMjugador commented 3 months ago

Recently, one of my CI workflows that uses cargo-binstall (and, by extension, quickinstall) began showing HTTP error warnings due to requests to https://warehouse-clerk-tmp.vercel.app returning a 402 status code. Curious about the cause of these requests, I investigated the topic a bit and came across issues like https://github.com/cargo-bins/cargo-binstall/issues/1822. To my surprise, I couldn't find any clear documentation in cargo-binstall or cargo-quickinstall indicating that this silent telemetry collection was occurring in the first place.

From an ethical standpoint, I think it's only fair to disclose that cargo-binstall collects telemetry by default. As noted in the linked issue, these additional HTTP requests can raise concerns, and privacy-conscious users may have valid reasons to opt out of such data collection.

Legally, while I'm not qualified to provide legal advice, I recognize that, despite the fact that the line between what is considered personal information and what is not can be very thin, that distinction can have significant implications, particularly regarding whether regulations like the EU GDPR apply to this telemetry data collection. If, for instance, the GDPR applies because the collection endpoint logs IP addresses or other device data that could potentially identify an individual, then the telemetry would be subject to strict disclosure, consent, and data handling requirements, which may not currently be met.

On the other hand, and from a practical standpoint, this telemetry collection has had a history of being unreliable at times (c.f. https://github.com/cargo-bins/cargo-quickinstall/issues/164). Therefore, even if slightly more users explicitly disable telemetry as a consequence of such a notice, I think it's unlikely for such decisions to have any statistically significant effect on reducing the usefulness of the collected data for analysis and decision-making.

In my view, Visual Studio Code could serve as a model for how to handle this, as it does a good job of disclosing how and why telemetry is collected on a dedicated documentation page. Alternatively, adding a dedicated section about telemetry to the project's README could also help bring such telemetry collection the attention it deserves. I believe that merely documenting telemetry collection through the --disable-telemetry CLI option is insufficient, as users interacting with cargo-binstall via e.g. taiki-e/install-action may never even discover any cargo-binstall CLI options.

NobodyXu commented 3 months ago

Thank you!

I agree that we absolutely would want such documentation for cargo-binstall.

Both in the README, and probably in the --help.

AlexTMjugador commented 3 months ago

It's great to hear that!

I wouldn't mind helping out by submitting a PR to describe this telemetry collection in the documentation, but I don't have a clear picture on how exactly it works and/or is meant to work, so I don't think it'd be really useful. Please feel free to go over more details either in this issue, in a draft, or in a final documentation modification, as it's more comfortable for you 😄

For what it's worth, a rough but hopefully helpful guideline of points I'd expect such documentation note to cover would be:

NobodyXu commented 3 months ago

I would welcome contributions/PR!

What data is collected (in my view, this should not only include obvious data sent by cargo-quickinstall on the HTTP request body, but also IP addresses or any other data that other services or parties may collect).

Based on my knowledge, only the crate to be installed, its version and the available targets on local is collected.

We would then use them to decide which version to build. For each target cargo-quickinstall supports, we'd maintain a popular crates and select them for building.

Where the telemetry is sent to.

https://warehouse-clerk-tmp.vercel.app/api/crate, it's not very actively maintained and was just in a working state.

There was plan to rewrite it but we didn't have much time working on it.

cc @alsuren implements the current statistics collection so they probably know about this more than me.

Who will use the telemetry data.

The url for access it is public (https://warehouse-clerk-tmp.vercel.app/api/stats) but it seems to be down for now.

For how long the data will be stored.

I don't know about this, you'd have to ask @alsuren , but the effect of these data (which binary is built) will be available on Github Release.

If the data access is public, then I would essentially say it is permanently saved.

Whether the data may be transferred to other parties in the future.

We don't have plans to do that, but if it is publicly available then others could already access.

How to opt-out.

AlexTMjugador commented 3 months ago

Awesome, thanks a bunch for the detailed answers! I've gone ahead and opened a PR to add the discussed disclaimers to the appropriate sections.

Depending on alsuren's answers to some points, we might or might not want to tweak some of the wording in the disclaimers. Either way, I believe some disclosure is better than none :+1:

AlexTMjugador commented 2 months ago

I'm closing this issue as I think the now merged PR is good enough to resolve it. Thanks to everyone involved! :tada: