Open JimMadge opened 4 months ago
Would we need to do synthetic data, or simply dummy data? The latter is a far smaller ask.
We'd only be aiming for people to be able to test that their code runs - it's not necessary for the data to have comparable statistical qualities to the original.
Good point. I think either synthetic or dummy data would give a benefit for researchers.
Both should give a good indication of whether the code will run or not. Synthetic data would give the extra advantage of giving more representative/interpretable results.
I don't think we have the capacity to invent the synthetic/dummy data tools ourselves.
However, we could think about can we,
Development outside the TRE would be enhanced with access to synthetic data that mimics the structure of sensitive data.
Such synthetic data could be used to validate code without the need for code ingress. It would also help debug code as there would be no need to find a method for egress of error messages from the TRE.
What could we do in the way of,