Closed bojobo closed 1 month ago
Creating the images takes quite a bit of compute. The current simulated dataset was created using a compute cluster having more than 100 cpu's. I calculated that it would have taken about 3 months to simulate these images using my own PC. Regarding the randomness, creating the simulated images has already quite a bit of randomness in it. However, for reproducibility we a static dataset is preferable. However, having a more diverse simulated dataset would definitively be interesting! Maybe we could even cluster them in classes (types of objects that are being observed) and see which classes perform better/worse than other. We could then generate more training samples for these classes.
Oh damn, that's quite some time. Does is take that long even with multiprocessing? I'll see if there are some optimisations possible.
I'm not quite sure about the classes. Super-resolution should work regardless of the types of objects observed, i.e. all classes should be represented with the same amount of images. Otherwise the model could tend to "hallucinate" object which aren't there.
Oh, and I'll probably move this issue too to the xmm-epicpn-simulator repo (same as SamSweere/xmm-epicpn-simulator#27)
Use https://github.com/SamSweere/xmm-epicpn-simulator for creation of simulated dataset
Use xmm-epic-pn-simulator. If possible, we could add some randomness to the creation process. We could even check if we can generate images on the fly.