Add a data simulator - Githubissues

SIESTA-eu / wp15

work package 15, use case 2

0 stars 0 forks source link

Add a data simulator #22

Open marcelzwiers opened 1 week ago

marcelzwiers commented 1 week ago

So far, we have scramblers that produce output data that is still directly based on the original input data. It would be useful to have output data that is generated from the input data, in a more indirect way, e.g. by simulation.

robertoostenveld commented 1 week ago

Could you make a list of simulators that you are thinking of, and prioritize them in an order for implementation? I would say that the simplest is to make data that is all zero. Also simple is to make random noise, but then you might already have to think about the distribution mean and standard deviation in relation to the original data file, as for example uint8 nifti files only contain positive integers up to 255. I guess you want to keep the file format (also in detail) identical, right?

marcelzwiers commented 1 week ago

I have been looking at fmrisim, as one of the more recent and open simulators:

https://github.com/brainiak/brainiak/blob/master/brainiak/utils/fmrisim.py https://brainiak.org/examples/fmrisim_multivariate_example.html https://peerj.com/articles/8564/

However, my initial idea was very similar to the STANCE method: https://github.com/jasohill/STANCE

robertoostenveld commented 1 week ago

And what are your thoughts about simulating the other data types? Like a tsv and an EEG simulator?

marcelzwiers commented 1 week ago

Nothing concrete yet, but I would make the simulator modular, so they could be added later

robertoostenveld commented 1 week ago

can you make a list in which you prioritize the different simulators? The anatomical MRI simulator would be useful for our use case 2.2, but for 2.1, 2.3 and 2.4 we would need other simulators.

marcelzwiers commented 1 week ago

A preliminary and incomplete list would be:

syntax: simulator input output type or: scrambler input output action sim or: scrambler input output sim type

Type:

fmri
anat
tsv
eeg/meg (last in the list only because it's not my expertise)

robertoostenveld commented 1 week ago

We don't have a fMRI pipeline yet (although I hope @NathalieVAYSSIERE will implement one), so I suggest that goes to the bottom of the priorities. We do have TSV, EEG and MEG data in the example pipelines that need to be shuffled/randomized.