cj-mills / christianjmills

My personal blog
https://christianjmills.com/
Apache License 2.0
2 stars 0 forks source link

posts/arc-a770-testing/part-3/ #40

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Christian Mills - Testing Intel’s Arc A770 GPU for Deep Learning Pt. 3

This post covers my findings from training style transfer models and running Stable Diffusion with the 🤗 Diffusers library on the Arc A770 with Intel’s PyTorch extension.

https://christianjmills.com/posts/arc-a770-testing/part-3/

ColonelPhantom commented 1 year ago

It looks like Intel finally released a PyTorch 2.x version of IPEX a few weeks ago: https://github.com/intel/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu and it even looks like it natively supports Windows now https://intel.github.io/intel-extension-for-pytorch/xpu/2.0.110+xpu/tutorials/installation.html!

Are you still planning to re-run these tests?

cj-mills commented 1 year ago

@ColonelPhantom Yep, I plan to try the new version on Ubuntu and Windows once I wrap up my current tutorial.

parthvzala commented 5 months ago

how is it now? Want to know before buying. Thanks

cj-mills commented 4 months ago

@parthvzala,

The most recent 2.1.20+xpu release of Intel's Pytorch extension partially works, depending on what you want to use it for.

There was a definite drop in quality with the PyTorch 2.0+ versions of the extension, and I'm still not sure what the source of the issues is.

Activating the IPEX_XPU_ONEDNN_LAYOUT environment variable now causes model accuracy to fail to improve during training.

Setting models to evaluation mode (e.g., with model.eval()) causes the model to produce completely different (and useless) results than when the model is in training mode (e.g., with model.train()), at least for the image classification, YOLOX, and Mask R-CNN models that I tested.

The evaluation mode issues aside, the image classification and YOLOX object detection models did improve to a usable point during training. However, while the Mask R-CNN model did improve during the training process, it failed to reach a usable accuracy.

Stable Diffusion inference with HuggingFace Diffusers still works, with the A770 able to produce 1024x1024 images with Stable Diffusion XL without issue.

Unfortunately, I don't have time to try running any LLMs, as I need to take the Arc GPU back out of my system today.

vishnumadhu365 commented 4 months ago

@cj-mills Hi Christian

For some of the issues you had faced on ARC770, would be great if you could post it as issues on the Intel Extension for Pytorch Github repo. This space is actively monitored by Intel engineers (like me)

Appreciate the detailed technical blog and looking forward to more interesting tutorials from you!

cj-mills commented 4 months ago

@vishnumadhu365

A reader opened a related issue in February, but I can make another one with more details when I have time to reinstall the ARC card in my desktop.