PathologyDataScience / HistomicsML2

A tool for training machine-learning models with whole-slide imaging datasets
22 stars 5 forks source link

Speed up feature extraction #40

Open cooperlab opened 5 years ago

cooperlab commented 5 years ago

Investigate how to speedup feature extraction pipelines. Examine GPU starvation and optimizing the reading and pre-processing of patches from WSI files.

Acmenwangtuo commented 4 years ago

When I do this process, i use the gpu,why it is still very slow and only one cpu core run?

Reasat commented 3 years ago

In the feature generation step docker run --runtime=nvidia -it --rm --name extractfeatures -v "$PWD":/"${PWD##*/}" cancerdatascience/hml_dataset_gpu:1.0 python scripts/FeatureExtraction.py --projectName "${PWD##*/}"

the FeatureExtraction.py script extracts features from one patch at a time. Significant speedup can be attained by loading the images in batches. I have patched up the file so that it loads batches of images.

https://gist.github.com/Reasat/76b53d6be24bceff4525e7ab92ca9ffd

Corresponding XML file https://gist.github.com/Reasat/f674ca6616edfc94039bc8c449b14844 @cooperlab

slee172 commented 3 years ago

@Reasat Thank you for the updates. We are working on a new version of the pipeline, so I'll see it again after the updates.

cooperlab commented 3 years ago

@Reasat thanks for looking at this. We acknowledge that the pipeline needs to be improved. We are working to introduce improvements that will speedup the 3 intensive processes - superpixel segmentation, boundary tracing, and feature extraction. We are done with the segmentation and tracing and should be done with the extraction portion soon. These improvements will allow feature extraction on multi-GPU systems.