Why couldn't an open science company sell customer data?

dcdanko commented 8 years ago

I appreciate the idea behind this manifesto and I am planning to adopt this (or something very close) for my company.

Why does the open science manifesto insist that customer data is not for sale? If a company was open about the fact that they sell customer data how would this action violate open access principles?

Users should be considered as contributors to the common goods, not as products.

But surely they can be both! If selling customer data enables a company to decrease its costs why shouldn't they do it? In some cases selling user data could result in better science

Let's take TailorDev itself as an example. Presumably it would be possible to track the IP addresses of people that use artich.io. From these IP addresses it would be possible to build a database of roughly where scientists work, most scientific work will be relatively easy to find because it would match universities, big companies, etc. However there might be some scientific work happening in unexpected locations.

Enter companies that provide wet lab work for biology as a service. If TailorDev's data was able to pinpoint a group of scientists working in an unestablished area another company might decide it was worthwhile to build a small sequencing center in that new area, which would make science easier for the unestablished group.

If TailorDev refused to sell user data (to be clear, TailorDev does not sell user data) the sequencing company would never know to build a new sequencing center, and TailorDev would have given up a valuable revenue stream. What's the benefit of this? Assuming TailorDev was up front about its use of customer data it hasn't reduced the dignity of its customers, it hasn't cheapened its product, and it may actually make life easier for scientists.

Saying that selling user data is equivalent to treating users like a product is a bit unfair to my mind. Selling user data is equivalent to treating user data like a product; why shouldn't we treat user data like a product?

I am absolutely open to hearing why I'm wrong!

jmaupetit commented 8 years ago

Hi David!

Thank you for initiating this discussion. We had similar feedback from other media, meaning that this part should be re-written. From my point of view, it will be a bit tricky/technical, but, to clarify what we initially meant, I think we should better define what "users’ data or behaviors" is.

In your example, we can consider that IP addresses are user profiles related metadata and not users' data. What we must avoid is to directly exploit research results from our users (datasets, raw data, figures, articles, etc.). By default, data should be (zero-knowledge) encrypted to ensure our users privacy until they decide to open their results to the community.

Please remember, that this manifesto is still a draft. For now, nobody is right or wrong: we want a fruitful discussion around each point raised in it to make it a community-driven initiative.

dcdanko commented 8 years ago

Hello!

What we must avoid is to directly exploit research results from our users (datasets, raw data, figures, articles, etc.).

That makes perfect sense. I'd also throw in that open-science companies should avoid publishing (in any format) any analysis that isn't based on entirely open-source, publicly accessible, data

By default, data should be (zero-knowledge) encrypted to ensure our users privacy until they decide to open their results to the community.

Ah, that might be tricky... There would be huge technical hurdles to overcome in order to use homomorphic encryption or zero-knowledge for any kind of data analysis.

More directly: nothing else in the manifesto suggests that open-science companies should be working towards a trustless environment. Requiring it in the manifesto seems like a little much maybe?

Please remember, that this manifesto is still a draft. For now, nobody is right or wrong: we want a fruitful discussion around each point raised in it to make it a community-driven initiative.

I hope I didn't come off badly! I think it's great that you're looking for community involvement.

willdurand commented 8 years ago

Ah, that might be tricky... There would be huge technical hurdles to overcome in order to use homomorphic encryption or zero-knowledge for any kind of data analysis.

Exact! What Julien meant (if I understood correctly) is that we really care about privacy, and we want to rely on technology to build "secure" and "safe" apps and we want to be transparent about it. (Not like what Apple recently did with Differential Privacy for instance)

But, technical stuff should not be part of a Manifesto anyway, so we should clearly rephrase this section. I'd say that what we want to express is:

the use of user's data of any kind without consent is prohibited
respect of the decision of the user to share/not to share data
ask users first if the company wants to exploit his "public" data
be open/transparent about the stack and/or make it clear where content is stored, etc.

As for the last point, French researchers often have to keep their data in Europe or even in France.

willdurand commented 8 years ago

@dcdanko just saw your PR/reference to it in this thread (I was commenting). Did you really want to open a PR on your own fork? Because, we really love your changes :-)

dcdanko commented 8 years ago

Exact! What Julien meant (if I understood correctly) is that we really care about privacy, and we want to rely on technology to build "secure" and "safe" apps and we want to be transparent about it.

Right, so it all comes down to the same principle. We just need to be open with users about how we're using their data.

I fixed my pull request and added a brief line about letting users know where their data is ebing stored.

TailorDev / manifesto

Why couldn't an open science company sell customer data? #9