apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

Can't get activity classification to work with MPS #919

Closed prashpal closed 4 years ago

prashpal commented 6 years ago

In OS10.14, activity classification works fine with CPU. I tried to force it to use MPS by setting use_mps() as True in the file _mps_utils.py. But the test case is just crashing.

Image classification seems to be using MPS, but I am not sure why activity classification is not.

Has the activity classification been verified to work with MPS? If so, could you share the steps to get it working?

srikris commented 6 years ago

Sorry you are having this issue.

It should work by default. What Mac hardware are you using?

prashpal commented 6 years ago

I am using a 2016 13 inch MacBook Pro with Intel Iris Graphics 550.

Below are the steps I followed:

  1. Install MacOS 10.14 public beta version
  2. Get Turi Create source code: https://github.com/apple/turicreate
  3. Set use_mps() = True in _mps_utils.py file
  4. Build Turi
  5. Run unit test for activity classification: ~/turicreate/scripts/run_python_test.sh debug

If I skip step 3, then unit tests run training on CPU and the tests are passing. But, with MPS forced, the test crashes.

nickjong commented 6 years ago

Unfortunately, the GPU acceleration for activity classification (and object detection) requires a discrete GPU, not the Intel Iris chipset. Image classification uses a different framework (via CoreML) to leverage GPU resources.

Probably we should update our documentation to clarify the requirements, especially since they differ across toolkits

prashpal commented 6 years ago

Thanks for the clarification. Are there plans to enable activity classification with Intel graphics since MPS can support it?

nickjong commented 6 years ago

Some testing with our current MPS implementation using Intel graphics did not reveal performance improvements over our MXNet (CPU only) implementation. We do plan to do some more work on activity classification, so we can certainly revisit this question after we've iterated on the implementation some.

igiloh commented 6 years ago

For future reference - _mps_utils.use_mps() is an internal function, that checks the user config plus relevant hardware availability, not a user facing API.
The APIish way for enforcing GPU usage would be

tc.config.set_num_gpus(1)
prashpal commented 6 years ago

Ok thank you @nickjong and @igiloh . Please keep me updated when support with Intel graphics is available. Even if we do not see perf improvement with Intel graphics, it will be good to have the option of using it.

igiloh commented 6 years ago

hi @prashpal ,

If you're building TC from source - you can try modifying has_fast_mps_support() in _mps_utils.py to return always true. If you're on mac OS 10.14+ - it would use the intel GPU.

prashpal commented 6 years ago

Yes, I am setting use_mps() = True in _mps_utils.py file to force to use Intel graphics. use_mps() seems to be a check for 2 things - has_fast_mps_support() and _tc_config.get_num_gpus() != 0.

With the above change, the test seems to be going through Intel graphics, but the validation test crashes. So I wanted to check if the tests were verified to work with Intel graphics?

Is my understanding correct?

  1. Image classification Training - can use CPU or discrete GPU (via MPS or MXNet)
  2. Activity classification Training - can use CPU or discrete GPU (via MPS or MXNet)
  3. Image classification Inference - can use CPU or discrete GPU (via CoreML or MPS or MXNet) or Intel GPU (via CoreML)
  4. Activity classification Inference - can use CPU or discrete GPU (via MPS or MXNet)
nickjong commented 6 years ago

I have not verified any tests for the Intel graphics MPS code path, since this code path is not currently supported.

Image classification has two phases: feature extraction using a neural network and logistic regression based on the extracted features. The logistic regression currently always runs on CPU. The feature extraction is the same for both training and inference, and always uses CoreML, which should use GPU or CPU, as available.

Activity classification training and inference both use either MPS (on Macs with AMD GPUs) or MXNet (using GPU or CPU, as available).

prashpal commented 6 years ago

Thanks for clarifying @nickjong

nickjong commented 5 years ago

We should probably just go ahead and use the Intel GPU anyway, since this is less confusing. Need to verify that this works end-to-end though

prashpal commented 5 years ago

Ok, thanks for the update.

prashpal commented 5 years ago

Hi @nickjong , I wanted to check if we have any updates on this.

nickjong commented 5 years ago

Sorry, nothing concrete to report yet, although activity classification is something we're actively investigating now

nickjong commented 5 years ago

We currently expect/hope to support Skylake Intel GPUs and later, in June

vade commented 4 years ago

Hi

Im curious if there is any public documentation on CoreML's device selection heuristic. With the addition of 10.15's CoreML preferredMetalDevice API for MLModelConfig, I imagined it would be possible to force the MTLDevice an MLModel / Vision request runs on.

In my testing with integrated, discrete and eGPU, it appears only the eGPU consistently runs the CoreML model. My CoreML Model is a pipeline model consisting of a Mobilenet classifier with multiple outputs (multi head classifiers attached to a custom feature extractor).

Im curious to understand device selection preference for a few reasons:

a) Id like to ensure my MLModel is fed images CIImages backed by textures local to the device inference will be run on, to limit PCI transfers and keep things local

b) my model is actually fed frames of video, and WWDC '19 / 10.15 introduces VideoToolbox and AVFoundation API's to help force particular video encoders and decoders on specific GPUs.

In theory, if all works well, I should be able to specify the same MTLDevice for video decode, preprocessing, CoreML/Vision inference, and subsequent encoding - keeping all IOSurface backed pixel buffers and textures resident on the same GPU.

Apple has a Pro Apps WWDC video suggesting this is the path forward to fast path Multi GPU support / Afterburner decoder support moving forward.

Does CoreML ACTUALLY allow suggested device placement to work?

I am running a retina MacBook Pro 2018 with Vega 20 GPU, and trying various methods to get the Vega 20 to light up.

I can only on occasion get the Vega 20 to 'light up' - but can consistently have CoreML run on the eGPU (Radeon 580)

I can inspect the CoreML model and see its MLConfig has a preferred device set to the Vega 20, but Instruments, Xcode, and Activity Monitor all report no GPU usage on the Vega 20, and in fact, sometimes no GPU usage at all (not even the integrated GPU).

Any insight would be most helpful.

Apologies if this is not the best repository to post my query to.

Thanks in advance.

TobyRoseman commented 4 years ago

@vade - this isn't the right place to ask this question. I suggest reporting the issue here: https://developer.apple.com/bug-reporting/

vade commented 4 years ago

I hear you @TobyRoseman - however having these convo's in the open rather than behind closed feedback is helpful to other developers who have similar questions, and leaves a breadcrumb trail to answers. I'm sure you understand!

But yes, ive asked there and on S/O as well. Appreciate the response!