This is a common problem for any data analysis involving personal information. Approach:
Build dataset that implements the same structure (organization, and filenames if possible), but does not contain the actual problematic data (maybe tracked, but not available through annex, but maybe even without any relationship to the actual data, i.e. mock-data,or simulated data)
Provide dataset publicly to aid development of analysis implementations
Clearly describe how this mock differs from the inaccessible other dataset
External users are instructed to create a new dataset (to hold their code) that has the mock dataset as a subdataset
External users submit their dataset, the subdataset is replaced with the real dataset (actual version is tracked), code is executed (after having been reviewed), results are captured in the submitted dataset.
Results are pushed back to the external users (or deposited in an accessible place for them to pull) -- the local data remains local and unavailable
This is a common problem for any data analysis involving personal information. Approach: