Making Orion work with other models

aravindhank11 commented 5 months ago

I am trying out orion with various configurations from the given example.. The example given at https://github.com/eth-easl/orion/tree/main/artifact_evaluation/example works well. However the same for mobilenet_v2 does not seem to work.

Environment

Docker container which has access to a V100 GPU

Config used:

[
    {
        "arch": "mobilenet_v2",
        "kernel_file": "/root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd",
        "num_kernels": 732,
        "num_iters": 100,
        "args": {
            "model_name": "mobilenet_v2",
            "batchsize": 32,
            "rps": 30,
            "uniform": true,
            "dummy_data": true,
            "train": false
        }
    }
]

Error state:

# LD_PRELOAD="/root/orion/src/cuda_capture/libinttemp.so" python launch_jobs.py mobilenetv2-32-config.json 1 1 1
[{'arch': 'mobilenet_v2', 'kernel_file': '/root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd', 'num_kernels': 732, 'num_iters': 100, 'args': {'model_name': 'mobilenet_v2', 'batchsize': 32, 'rps': 30, 'uniform': True, 'dummy_data': True, 'train': False}}]
1
Init CUDA streams once!
Init stream pools!
Flags is 1, Priority is 0
Get low priority stream!
1.12.0a0+git67ece03
mobilenet_v2 32 0 [<threading.Barrier object at 0x7f09eef7c190>] 0
[135]
[0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333, 0.03333333333333333]
size is 100
REEF IS False, SEQUENTIAL IS False
[False]
['mobilenet_v2'] [b'/root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd'] [135]
KERNEL_INFO_FILE IS /root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd
----------- SIZE: 732
size is 0
Num clients is 1
before starting, profile is True
-------------- thread id:   135
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
at init
Enter loop!
Start epoch:  0
Client 0, submit!, batch_idx is 0

It does not make any progress post this. Am I configuring things wrong? I have made no changes to the kernel info file at /root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd

Any help or pointers would be greatly appreciated :)

fotstrt commented 5 months ago

Hello! Thanks for your interest in Orion!

Can you please mention what is your PyTorch and CUDA version? PyTorch/CUDA version affects the number of kernels you have, and their profiles.

Could you also please try the '/root/orion/benchmarking/model_kernels/mobilenetv2_4_fwd' example? You can have sth like:

{ "arch": "mobilenet_v2", "kernel_file": "/root/orion/benchmarking/model_kernels/mobilenetv2_4_fwd", "num_kernels": 152, "num_iters": 12000, "args": { "model_name": "mobilenet_v2", "batchsize": 4, "rps": 40, "uniform": false, "dummy_data": true, "train": false } }

aravindhank11 commented 5 months ago

Thank you for the quick turn around, @fotstrt

I am using the provided docker container. So my versions are:

>>> import torch
>>> print(torch.__version__)
1.12.0a0+git67ece03

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Further, I can confirm that using the following config works:

[
    {
        "arch": "mobilenet_v2",
        "kernel_file": "/root/orion/benchmarking/model_kernels/mobilenetv2_4_fwd",
        "num_kernels": 152,
        "num_iters": 100,
        "args": {
            "model_name": "mobilenet_v2",
            "batchsize": 4,
            "rps": 30,
            "uniform": true,
            "dummy_data": true,
            "train": false
        }
    }
]

Is there any reason why /root/orion/benchmarking/model_kernels/mobilenetv2_32_fwd did not work?

fotstrt commented 5 months ago

Hi! Thanks for checking!

Yes, unfortunately, the specific configuration file might have been misplaced since we don't use it anywhere (and forgot to remove it during cleanup). I will try to do a cleanup and remove/replace files accordingly asap. Also, if you want to profile your own models, please find instructions here: https://github.com/eth-easl/orion/blob/main/PROFILE.md.

Thank you for bringing this to my attention, and apologies for the inconvenience!

aravindhank11 commented 5 months ago

Thank you @fotstrt. I did try the steps to profile my model, and I suppose the torch and cuda version I profiled it was different than the one in docker container. So I am repeating the steps now.

But in the process, I am observing that the torch in the container is not compiled with numpy support:

>>> preprocess(Image.open(image_path))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/vision/torchvision/transforms/transforms.py", line 94, in __call__
    img = t(img)
  File "/vision/torchvision/transforms/transforms.py", line 134, in __call__
    return F.to_tensor(pic)
  File "/vision/torchvision/transforms/functional.py", line 164, in to_tensor
    img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))
RuntimeError: PyTorch was compiled without NumPy support

I was using a pre-process function to create batches for inference workload. Since numpy was not available, I examined the codebase of orion which seems to be performing inferences not in batches, but 1 image after the other as in: https://github.com/eth-easl/orion/blob/main/benchmarking/benchmark_suite/train_imagenet.py#L180-L192.

Is this intentional or am I understanding it wrong?

fotstrt commented 5 months ago

In our experimental setup, we are trying to simplify things to see what is really happening in the GPU, and examine all policies under tight cases where there is not a lot of preprocessing. However, we still use batches, but we prepare the torch tensors like here: https://github.com/eth-easl/orion/blob/main/benchmarking/benchmark_suite/train_imagenet.py#L36 (note that batch size is the first dimension of the tensor).

So we do it in batches, but the tensors are pre-made to avoid preprocessing times that might influence the performance and our conclusions. Does it make sense?

aravindhank11 commented 5 months ago

Cool, this is perfect :) Thank you for the patient responses!

If I were to write my own inference function with code as simple as the one in https://pytorch.org/hub/pytorch_vision_mobilenet_v2/. Would I have to consider anything to integrate with orion (apart from profiling and creating a config file)?

As I see at https://github.com/eth-easl/orion/blob/main/benchmarking/benchmark_suite/train_imagenet.py#L79-L83, there seems to be other parameters such as: local_rank, barriers, client_barrier, tid, input_file which are not from the config file args as in https://github.com/eth-easl/orion/blob/main/artifact_evaluation/example/config.json.

Is there any guide on how to use and config these variables?

fotstrt commented 5 months ago

I recommend having a look at this file: https://github.com/eth-easl/orion/blob/main/benchmarking/launch_jobs.py. You can use it as a test with e.g. 1 model, or with more to test intereference. It will basically spawn a thread for each model/script. You can find how the arguments are passed here: https://github.com/eth-easl/orion/blob/main/benchmarking/launch_jobs.py#L81

I hope that helps!

aravindhank11 commented 5 months ago

Thank you for all the help! I got a toy example up and running :) Much appreciated and amazing work on this!! I enjoyed reading the paper and trying it out!

If I may, I have another question: Is the first job in the config_list always regarded as high priority, with the others assumed to be best-effort tasks?

fotstrt commented 5 months ago

Hey, sorry for the late reply!

No, actually the last job in the config_list is high-priority, and all others are best-efforrt, e.g. here: https://github.com/eth-easl/orion/blob/main/artifact_evaluation/fig7/config_files/bert_mnet.json the MobileNet inference job is the high-priority one. Also, our current version in this repo works with 2 clients.

(we have implemented and tested Orion with more clients but haven't merged yet. We hope to do it soon!)

aravindhank11 commented 5 months ago

Thank you, that makes sense... I have been using orion since yesterday with 3 clients. It seems to be working - unsure if it is working as expected.. Can you please let me know if there are changes to the shared library?

I wrote my own wrapper like PyScheduler to suite my needs.

fotstrt commented 5 months ago

Yes i will update you! I would recommend testing Orion with 2 clients first, and checking the scripts we have under https://github.com/eth-easl/orion/tree/main/artifact_evaluation (e.g. run https://github.com/eth-easl/orion/blob/main/artifact_evaluation/fig7/run_orion.py which collocates high-priority inference with best-effort training job) to see the expected behavior of the system

aravindhank11 commented 5 months ago

Thank you! I shall close the issue. Thank you for all the help though!

Do you mind if I create a new issue to track usage orion with 2+ clients? There by you can mark it closed once done and I can start using it?

fotstrt commented 5 months ago

Of course! Also please let me know if there are any problems with the 2-client setups!

Thanks again for your interest in Orion!

aravindhank11 commented 5 months ago

Last qn and sorry for more question: From the configs you pointed out, there seems to be usage of additional_kernel_file. Could you please let me know how to build this and how it is used?

fotstrt commented 5 months ago

The specific file is used when there is a training job - we observed that the kernels in the 1st iteration are different than in the rest, so we needed to profile and generate 1 extra file. All files are included under config_files. If you are not interested in training, you can have a look at the files here: https://github.com/eth-easl/orion/tree/main/artifact_evaluation/fig10

aravindhank11 commented 5 months ago

Superb. Yes, I am interested in inference workloads. Closing this issue. Thank you again!

eth-easl / orion

Making Orion work with other models #24