apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

Object Detector training is unnecessarily slow when source images are large #1021

Open dhgokul opened 6 years ago

dhgokul commented 6 years ago

@srikris Is there any way to speedup turicreate training ? As of now we used deep learning GPU machine with cuda 9.0 for training , it took ~ 3 to 4 hours for 1000 images with default iteration assign by object detection API

cianiandreadev commented 6 years ago

You can reduce the iterations. But of course this will reduce the model accuracy.

Example tc.object_detector.create(train_data, max_iterations:50)

srikris commented 6 years ago

I'd not really recommend reducing the iterations unless you are certain that for your application, the model is not learning after a while. The defaults have been chosen carefully based on experiments we did on many datasets. On CUDA, the object detector is IO bound and its an issue we are looking to fix.

dhgokul commented 6 years ago

@srikris Thanks, Any way to speedup training process time with multiple GPU machine ?

srikris commented 6 years ago

We should be automatically using all the GPUs you have on your machine.

dhgokul commented 6 years ago

How much time it would take to train minimum images (25 images with csv data) ? Is there any performance difference between turicreate 4.3 and latest beta version 5.03 ?

As of now we used single GPU tc.config.set_num_gpus(1)

znation commented 6 years ago

@dhgokul The biggest difference between 4.x and 5.0 is that on 5.0, GPUs can be used on macOS. Prior to 5.0, GPU support was only available on Linux. 5.0 is now released (out of beta) so please try it out, thanks!

philimanjaro commented 5 years ago

System Info: Cuda 9.0 mxnet-cu90==1.1.0 Nvidia Driver Version: 384.130 Ubuntu 16.04 LTS TuriCreate 5 (Latest Version) 5 GTX 1080Ti GPUs Intel 7980XE 18-Core CPU 64GB RAM 1TB SSD SATA3

When training my Object Detector, I can see the memory usage of each GPU fill up (see attached screenshot) -- but the amount of energy these cards using is very low. Very rarely have I seen one of the GPUs hit more than 50w usage and most of the time the fans do not even blow (indicating the cards aren't being used very heavily). I don't imagine it would be an input bottleneck given the performance

I set my configs in the script to: tc.config.set_num_gpus(-1)

When I begin training, I see the message: "Using GPUs to create model (GeForce GTX 1080Ti, GeForce GTX 1080Ti, GeForce GTX 1080Ti, GeForce GTX 1080Ti, GeForce GTX 1080Ti)"

Is this normal behavior to only utilize the GPUs available memory, or should the cards actually be using their GPU muscle to push through the Object Detector training?

Snippet: model = tc.objector_detector.create(train_data, feature='image', annotations='annotations', batch_size=512, max_iterations=10000)

Screenshot is the resulting output of: nvidia-smi -l

img_0071

Below screenshot is the current status of my training. The SFRAME is 135mb. There are 225 images at an average of 600kb each (with only a few at about 1mb, but no more than that). Batch Size is 512 -- and it takes an average of 177 seconds per iteration.

img_0075

This final screenshot below is from the System Monitor that details RAM usage, Swap used, and CPU usage. A few cores sometimes hit 90%, but all the others seem to barely be used (which i'd imagine is expected given that I chose to use my GPUs, but from the bottlenecking standpoint, it doesn't appear that the CPU would be the issue): img_0073

nickjong commented 5 years ago

I suspect the issue here is that we're bottlenecking in the data preparation/marshalling stage of the pipeline. We can't sample/augment/reshape the data fast enough to keep up with all the GPUs here. We're currently working on a new C++ implementation of the relevant code, to avoid bottlenecks inherent in the Python implementation. This should be a first step towards improving performance overall.

philimanjaro commented 5 years ago

As I've been experimenting quite a bit with getting the performance in order -- I've been able to get Turi Create to utilize much, much more of my GPUs power; not all of it, but a much more sizeable chunk.

To do this -- I resized the data images that were originally taken with my iPhone X and iPhone XS. Previously, I had only compressed them but did not resize them. Once I compressed them gently and resized them to 416x416 (without maintaining the aspect ratio) using preview in MacOS, I had file sizes of less than 200kb each. Once I trained the images in Turi, things trained much more quickly and for the first time ever, I was able to see my 5 GPUs use roughly 100w of power each, with their utilization on each sitting between 60% and 100%. I increased my batch size to about 560, which pushed my 5 GPUs (GTX 1080Tis) to their limit without receiving memory errors.

I figured that if the issue was with a bottleneck in the Python script's performance (even with my 18-core CPU (due to lack of script optimization) that if I resized them ahead of training that I'd have less of a bottleneck. That seems to be the case.

Most importantly, the model's performance is spectacular when used in my app in a real-world setting -- the resizing of source material to 416x416 without maintaining the aspect ratio and my gentle compression (using jpegoptim) didn't seem to hurt things as far as I can tell. My evaluation in Turi and in real-world usage accurately detects models with a 88%+ confidence score.

I used RectLabel to annotate my data and was concerned that I would have to spend many many hours redoing previous work after resizing the images; but after opening up the resized images in RectLabel, the bounding boxes adjusted appropriately without having to redo that part of my work. I'm not entirely sure how that worked out in my favor, but I'm happy it did!

I know that there are many more optimizations can take place on the side of Turi Create, such as the C++ implementation mentioned by @nickjong (I'm definitely excited for this!)... but this small tweak made a big difference for me personally. I'm not a machine learning expert (quite noobish, actually) so I am not sure if this was a bad idea, but the final model is performing very well and that's all that really matters to me.

If you end up taking the route I did to try to increase your performance -- please remember to back up your original source data before resizing and compressing your images in case it doesn't work out for you.

nickjong commented 5 years ago

We'll still moving forward with a revamped Object Detector implementation (more slowly than I'd like), but I wanted to re-title this issue and narrow its focus so that your idea doesn't get lost in the shuffle. When given large images in the SFrame, we should definitely do some resizing upfront. In the general case, I suspect the main reason we don't do that already is that the data augmentation pipeline does some random cropping (within a constrained range of scales and aspect ratios), so preserving the original geometry can make a difference in theory. (For example, we take a 416x416 image and randomly crop it, we'd have to scale it back up to 416x416 for input into the network.) But given the constraints on the random cropping, etc., these should be a maximum size for the images stored.

elishaterada commented 5 years ago

@philimanjaro thank you so much for your detailed information on how you solved this. I also annotated images in original image size and managed to crash Mac while running Turi training on 300 images. So I resized the images, moved over from annotating on Via to RectLabel (which directly supports Turi Crete export!).

Although annotation rectangles showed up in the same exact place after image resize, the underlying annotated rectangle coordinates were from the original image dimensions. So I had to basically change something in each photo (label, the position of the rectangle, etc) for RectLabel to save the new coords.

Hopefully, this is a useful resource for the future reader of this thread: https://rectlabel.com/help#xml_to_csv_turicreate https://twitter.com/rectlabel/status/1111113689589927939

waddahAldrobi commented 5 years ago

@elishaterada

I had had the same problem before with Rectlabel. All you had to do was click save. Then the XML files would've updated.

Cheers

ryouchinsa commented 1 year ago

RectLabel is an offline image annotation tool for object detection and segmentation. Although this is not an open source program, with RectLabel you can resize images and annotations. You can export/import the Turi Create CSV format.