gianba / SeizureDetection

3 stars 0 forks source link

Parallelize the feature extraction #13

Open scheuchzer opened 10 years ago

scheuchzer commented 10 years ago

Can we use Amazon servers for the extraction step to speed up the feature extraction?

zadsas commented 10 years ago

If we think that this is unfeasible/difficult for now, we can also boost the feature extraction by exploiting the GPU chip with CUDA. http://www.r-tutor.com/content/download "The rpud is a open source R package for performing statistical computation using NVIDIA CUDA GPU."

gianba commented 10 years ago

I'm not sure if it isn't easier to simply distribute the workload on several machines instead of going into parallel processing on the GPU. This will heavily influence the internals of our processing pipeline and I assume that we can no longer make use of the nice tuning functionalities provided by e1071. Still this issue makes only sense if someone is interested to investigate the assets and drawbacks of such an approach. Otherwise it's probably more rewarding to invest some time into refactoring the processing pipeline (pre-process and store ICA/PCA, etc.).

zadsas commented 10 years ago

I totally agree with you @gianba, just thought of mentioning it, but you are right, this is low priority

scheuchzer commented 10 years ago

I'll have a look at http://cran.r-project.org/web/packages/doRedis/index.html. Uses a Redis Database as work manager. There's also a free Redis provider http://redislabs.com that allows free database with 25MB. Should be enough to store a few file or directory names. It should be possible to add additional cluster members easily as long as they have the same sample data available locally.

scheuchzer commented 10 years ago

There are two simple demos. One detecting all the workers and printing their name (showing deferred variable resolution). The other example 'calculates' some matrixes where faked calcualtion time and result size can be specified.