DataResponsibly / DataSynthesizer

MIT License
252 stars 85 forks source link

Speeding up DataSynthesizer #30

Closed mahmoudibrahim98 closed 3 years ago

mahmoudibrahim98 commented 3 years ago

Description

Hello, I'm trying to synthesizer datasets with high number of attributes (above 50). However, the data synthesis process is taking too long (multiple hours). Is there a way we can speed up the process? Can we run it on the GPU?

What I Did

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
haoyueping commented 3 years ago

Hi, are you familiar with GPU acceleration for Python? If so, it would be great if you can help adapting the code to GPU. I will be glad to collaborate.

I have little expertise in this topic. It seems that Numba is the go-to solution, but the operations accelerated by Numba seem to be limited. At least a simple @jit decorator does not work for DataSynthesizer.

What do you think?

mahmoudibrahim98 commented 3 years ago

Hello, unfortunately I have no experience with GPU acceleration.