Fix and update docker container

rakataprime commented 1 year ago

updates to a docker container with only root python and jupyter environment and default password

rakataprime commented 1 year ago

@chainzero @anilmurty, I have updated the sdl and container. This simpler approach should work. Its not as ideal for running for multiple users, but should be fine for benchmarking tommorrow. I have tested locally still need to test on testnet.

arno01 commented 1 year ago

@rakataprime please let me know once you've done testing it so I can merge.

anilmurty commented 1 year ago

Just dropping an update here for anyone looking at this: We have gone through testing multiple iterations of testing with this, making tweaks along the way. At this point we have this working on providers running CUDA 11.7 but not consistently. We have not been able to get this working for providers on CUDA 12. We also have testing to do, to make sure the test scripts are executed on the GPU (not the CPU) and work on formatting the results or clearly calling out what results we want users to capture.

anilmurty commented 1 year ago

Things left to do, before we can merge this:

Updating images to docker.io/thumperai/torchbench:v0.0.11-cuda-11.7-dev and docker.io/thumperai/torchbench:v0.0.11-cuda-12.0-dev
Update Instructions section in Akash_Gpu_Benchmark_Notebook.ipynb with following info: a. Tell the user to use the ">>" button in the jupyter notebook menu (it restarts the kernel and executes all cells sequentially - vs. play that needs to be run per cell) b. Tell the user to use "print, save as PDF" if "save and export as" results in an error
Add README.md file to this that tells the user: a. which SDL to use for which models b. what to do in the default page (needs to enter the notebook passwd - which is in the sdl) c. what to do when the notebook opens (open Akash_Gpu_Benchmark_Notebook.ipynb from the left menu by double clicking it and then follow the instructions documented in the "Instructions" section there) d. Any other developer help you may want to add

rakataprime commented 1 year ago

@anilmurty I have updated the README and completed the above items.

anilmurty commented 1 year ago

Please update 12.0 version SDL to fix "vendor" key formatting error and add a few more SDL models (sent SDL in discord). README and ipynb text help looks great

rakataprime commented 1 year ago

@anilmurty I have updated the SDLs

anilmurty commented 1 year ago

Thanks so much @rakataprime - merging this in now

akash-network / awesome-akash

Fix and update docker container #415