SIESTA-eu / wp15

work package 15, use case 2
0 stars 0 forks source link

Can scrambled data be descrambled? #20

Closed arnodelorme closed 2 weeks ago

arnodelorme commented 2 weeks ago

Naive question: Can scrambled data be descrambled using a file that would contains some keys to descramble.

As a user, I imagine, I would want to scramble my data, have the pipeline process it on the cloud, then download and descramble, or maybe I am not getting it.

robertoostenveld commented 2 weeks ago

That depends on the scrambling method used. Different methods are to be implemented by @marcelzwiers. One of the proposed scrambling methods is to replace the data with random noise, retaining the file structure, format and size but not any actual data - that would not allow for descrambling, although linkage to the original data would still be possible.

Cyril will hire a student(?) to work in SIESTA on the quantification of the information that is left after scrambling and the identifiability of the data following scrambling. That would be a (scientific) deliverable.

robertoostenveld commented 2 weeks ago

And as a SIESTA end-user you don't scramble. You get access to scrambled data (not to the sensitive data), implement your pipeline on that, and then request the platform operator or data rights holder to execute that pipeline on the original sensitive data. The group-level output of that is to be evaluated on being non-identifying and is then shared with you.

However, as SIESTA developers we do have to scramble ourselves, as there is no SIESTA platform yet that does it for us.

robertoostenveld commented 2 weeks ago

For a more general perspective you may want to review https://github.com/SIESTA-eu/wp15/blob/main/README.md

arnodelorme commented 2 weeks ago

That makes more sense. What if the data right holder is also the data user. Could there be a key that the data right holder provide to him or to data user to descramble the data. If it is BIDS and anonymized, the data user should be able to descramble it no?

robertoostenveld commented 2 weeks ago

If the data rights holder is also the user, they don't have need for SIESTA but can do the computations on their own computers and the sensitive content of the data does not have to be protected.

If the scrambled data can be descrambled, it is by definition not anonymous.

If the data is anonymous, it is not personal data any more and GDPR does not apply, hence the data is not sensitive, and SIESTA is not needed.

We are not aiming to build a platform for everyone and all sorts of computations, only a platform for sensitive data that cannot be shared and processed by others otherwise.

arnodelorme commented 2 weeks ago

OK, I get it. Thank you for clarifying. That makes sense.