SlapDrone / warp-lane

2 stars 1 forks source link

Notes on considering both tecnical and legal means of protecting training data #8

Open wdbm opened 3 years ago

wdbm commented 3 years ago

It may be that training data can be from some standard, public source, such as the Million Song Dataset (MSD) or even something more basic. That more basic form might be inspired by technology that has been developed by Melodyne that can separate out notes, tracks, instruments and so on. I actually would lean towards this approach of using public data.

It may be that training data can be built from data transferred to and from users of the proposed system (users here could mean owners of sound effects equipment as well as others). In such a case, it could be ethical to protect that data and would be good to insure that users are informed of this, and better still if users can verify this.

Legal means of protection

Mathpix provides a template for both how to protect user data legally and also a template on how to generate income from the system. Mathematical equations (screencaps of LaTeX, scribbes and so on) are sent through the system and the user receives a transcription of what was sent. When the user uses the system, the system provides a checkmark by which the user can grant permission for the images to be used to train the system. That is totally trust-based (as opposed to verify-based). However, something of a legal protection is provided by Mathpix: https://mathpix.com/privacy

Technical means of protection

Numerai is a hedge fund that pays others for their service of predicting (by whatever means) extrapolated values in some training data that has some obscured features. The means by which this hedge fund obscures the features might be worth considering. I do not understand it, but I recall it being claimed that it is a "structure-preserving" way of obscuring features. In other words, the features can be obscured in a way that makes it hard to understand the features, but that still makes statistical analysis possible for predictive purposes. So, perhaps users of the proposed Warp-Lane system could in their client (or verifiable JavaScript or whatever of a website) have their sound data obscured in this way, sent via the system for both training the system and for returning a modified sound to the user, and then, client-side, the sound is un-obscured by the user.

There could easily be other approaches to this. The important emphasis is on this being verify-based as opposed to trust-based.

Check out the Numerai advert from October. It is one of the stranger advertisements: https://www.youtube.com/watch?v=GWeC2PK4yXQ

And a slightly more grounded video from when they first started: https://www.youtube.com/watch?v=dhJnt0N497c