Closed wohlbier closed 2 years ago
Quick heads-up.
While doing some testing we discovered a bug in the dockerfile that prevents the container from running the mini-era application. The application should run under the base conda environment, instead of the hpvm one: $conda activate base. The Dockerfile has been patched accordingly, and only affects the last layer, so no need to regenerate the entire image.
Regarding APPROXHPVM_DIR path. This is part of the mini-era directory, which is in the docker image for legacy reasons, and not intended to be run or compiled. In Step 2 of the previous email, we will produce a clean image with only the necessary files. For now, this version of the docker image runs the scheduler version of mini-era.
cd scheduler-library/examples/mini-era && ./test-scheduler-S-P3V0F0N0 -t traces/tt00.new -o -G config_files/base_me_p2.config
Ok, so I should not cd to mini-era and issue ‘make’. Those are the builds that are failing, but it sounds like they are not expected to run.
I'll inspect the output of the scheduler mini era to try to get a handle on the yolo model.
I did notice that the example dumps core at the end.
Accelerator Usage Statistics:
Per-Accelerator allocation/usage statistics:
Acc_Type 0 CPU-Acc : Accel 0 Allocated 16088 times
Acc_Type 0 CPU-Acc : Accel 1 Allocated 3867 times
Acc_Type 0 CPU-Acc : Accel 2 Allocated 45 times
Per-Accelerator-Type allocation/usage statistics:
Acc_Type 0 CPU-Acc Allocated 20000 times
Acc_Type 1 FFT-HW-Acc Allocated 0 times
Acc_Type 2 VIT-HW-Acc Allocated 0 times
Acc_Type 3 CV-HW-Acc Allocated 0 times
Per-Meta-Block Accelerator allocation/usage statistics:
Per-MB Acc_Type 0 CPU-Acc : Accel 0 Allocated 3024 times for MB29
Per-MB Acc_Type 0 CPU-Acc : Accel 0 Allocated 3064 times for MB30
Per-MB Acc_Type 0 CPU-Acc : Accel 0 Allocated 10000 times for MB31
Per-MB Acc_Type 0 CPU-Acc : Accel 1 Allocated 1948 times for MB29
Per-MB Acc_Type 0 CPU-Acc : Accel 1 Allocated 1919 times for MB30
Per-MB Acc_Type 0 CPU-Acc : Accel 2 Allocated 28 times for MB29
Per-MB Acc_Type 0 CPU-Acc : Accel 2 Allocated 17 times for MB30
Done.
In the cleanup-state routine...
Doing accelerator type closeout for 4 accelerators
Segmentation fault (core dumped)
Thanks for bringing this up.
The segmentation fault has been patched in commit ceb11ff of the scheduler library.
You will need to regenerate the image, at least from the layer where the scheduler library is cloned. We also made some updates to the last layer of the Dockerfile and how it should run. The README has been updated accordingly.
To run the docker the following line should be used now. (Changed the path of yolo-data)
docker run -uespuser -v “$(pwd)”/yolo-data:/home/espuser/mini-era/yolo-data:ro --rm -it <name>:<tag> /bin/bash
In
setup_paths.sh
on thedocker-hpvm-yolo
branchAPPROXHPVM_DIR
is set to a path does not exist in the build. This line in particular:https://github.com/IBM/mini-era/blob/docker-hpvm-yolo/setup_paths.sh