arch4edu / arch4edu

Arch Linux Repository for Education
https://arch4edu.org
609 stars 45 forks source link

Package Request: python-pytorch-all #222

Open iamhumanipromise opened 1 year ago

iamhumanipromise commented 1 year ago

Case for package:

_1. Many Haswell+ laptops also have a dedicated nVidia GPU in addition to their Intel Gen8+ GPU. These laptops also have Thunderbolt, and to save money many encountered at universities and by a client in California with the eGPU use a Radeon card. (This makes it "easier" to use the eGPU on a Mac for dev work)

  1. This multi-architecture approach would also be a "laptop-sized" example of a mixed-computing environment in a high-performance setting using mixed architectures in a on-prem data center, or in mixed cloud-computing environments. So is a microcasm of "real-world, high performance computing usage" that students do not have access to until graduate programs. (Even then, it is for a limited amount of time.)_

Package Description:

Tensors and Dynamic neural networks in Python with strong GPU acceleration (with TensorRT, CUDA, ROCm, OneAPI-DNN (MKL-DNN), ZenDNN and AVX2 CPU optimizations)

This package will then permit function of:

With this package: most CPUs and CPUs made since 2012/2013 will be able to provide student and researcher usage for AI with such a package.

Real World Example Machine #1

Real World Example Machine #2

Each one of these could benefit from a bundled package such as the one mentioned.

Machine 1 = 5624 Cores of which 28 Cores are VPU, 6 Cores are CPU, 368 are Tensor and 46 are RT with the remainder GPU/CUDA cores.

Machine 2 = 9440 Cores of which 8 are CPU cores, 32 are RT cores, 512 are XMX cores and the rest are GPU/CUDA cores.

(Note: A Movidius VPU shave core is a 128-bit VLIW vector processor that can perform parallel computations on image and video data.)

petronny commented 1 year ago

Tensors and Dynamic neural networks in Python with strong GPU acceleration (with TensorRT, CUDA, ROCm, OneAPI-DNN (MKL-DNN), ZenDNN and AVX2 CPU optimizations)

Great idea but I'm not sure if it's possible to compile pytorch with both CUDA and ROCM enabled.

  • With CUDA 9 or 11 all nVIDIA sm20, sm3x cards (Fermi, Kepler: Notable cards include the 6GB Quadro 6000, Quadro 7000, 2x 6GB Quadro Plex 7000; 2x 12GB Tesla K80, 12GB Tesla K40, the 6GB cards below that; also the 6GB Titan and Titan Black and 2x 6GB Titan Z.
  • With CUDA 11.6 all sm_4x cards (Maxwell: GTX 980, etc)
  • With CUDA 12 all sm_5x cards and above.
  • ROCm would support all Polaris GFX804+ GCN4 cards such as the Sapphire 16GB RX 570 or 2x 16GB Radeon Pro Duo, Vega, and above.

Old GPUs may only work with their corresponding old CUDA/ROCM libraries. But even if these different CUDA/ROCM libraries do co-exist in the system, I don't think the compiler can use all of them. For example, in the code we have #include <cuda.h>, the compiler will only use the first found cuda.h. Besides, old GPUs may not be officially supported by the latest pytorch neither.

And I don't think different libraries co-existing in the system fits the philosophy of arch linux. It sounds more like gentoo.

Back to your situations, I think they are the ideal for virtual enviroments, anaconda or even docker. Just create an enviroment for each kind of your computing resources on your machine and install the right version of pytorch then somehow provide a unified interface to access the pytorch of them. For example, the users can run their code like:

enviroment=cuda # Or rocm, cuda8, cpu_only or any enviroment
source /path/to/$enviroment/bin/activate
python train.py
iamhumanipromise commented 1 year ago

Thank you for the advice regarding how to proceed!

I’m also thinking of using SHARKSHARK instead to use the iGPU, dGPU, CPU together instead of containerized apart.

The SHARK approach may be a little more difficult given lack of documentation (but should be a cleaner solution! — will have to request a package for that if it works!! ;-)

iamhumanipromise commented 1 year ago

This being said - is it possible to have a package for PyTorch and TensorFlow that is “opt-ROCM” or “opt-CUDA” but the “opt” part includes both Intel CPUs & GPUs & Myriad VPUs?

Aka the behavior OPENVINO has today (yet another one without a package!)

iamhumanipromise commented 1 year ago

Summary so far: these are two HSA machines which need local virtual environments and a new backend to schedule/coordinate/orchestrate/facilitate the distribution of neural workloads. (Or another clever solution)

I have tried to use various existing techniques to expose virtualized environments to each other to no avail and am again limited by this lack of the "neural fabric" interface for multiple virtual environments to distribute and coordinate the processing of jobs simultaneously. I also do not yet have the Python skills to create this.

I have opened an issue on the Easy Diffusion project Github.

I have enrolled in an MIT 12-Week crash course in AI but I believe all that will do is give me a better overview before diving deeper. It will be awhile before I am able to develop the strategy, though the dev environment is lying in wait!