Closed rakataprime closed 1 year ago
@chainzero @anilmurty, I have updated the sdl and container. This simpler approach should work. Its not as ideal for running for multiple users, but should be fine for benchmarking tommorrow. I have tested locally still need to test on testnet.
@rakataprime please let me know once you've done testing it so I can merge.
Just dropping an update here for anyone looking at this: We have gone through testing multiple iterations of testing with this, making tweaks along the way. At this point we have this working on providers running CUDA 11.7 but not consistently. We have not been able to get this working for providers on CUDA 12. We also have testing to do, to make sure the test scripts are executed on the GPU (not the CPU) and work on formatting the results or clearly calling out what results we want users to capture.
Things left to do, before we can merge this:
docker.io/thumperai/torchbench:v0.0.11-cuda-11.7-dev
and docker.io/thumperai/torchbench:v0.0.11-cuda-12.0-dev
Akash_Gpu_Benchmark_Notebook.ipynb
with following info:
a. Tell the user to use the ">>" button in the jupyter notebook menu (it restarts the kernel and executes all cells sequentially - vs. play that needs to be run per cell)
b. Tell the user to use "print, save as PDF" if "save and export as" results in an errorAkash_Gpu_Benchmark_Notebook.ipynb
from the left menu by double clicking it and then follow the instructions documented in the "Instructions" section there)
d. Any other developer help you may want to add@anilmurty I have updated the README and completed the above items.
Please update 12.0 version SDL to fix "vendor" key formatting error and add a few more SDL models (sent SDL in discord). README and ipynb text help looks great
@anilmurty I have updated the SDLs
Thanks so much @rakataprime - merging this in now
updates to a docker container with only root python and jupyter environment and default password