Closed Syakyr closed 8 months ago
@Syakyr, does this mean that we'll have to refactor the docs to include relevant code chunks for each problem?
@Syakyr, does this mean that we'll have to refactor the docs to include relevant code chunks for each problem?
Doc refactoring should be minimal. The code that is replacing Pytorch as the default would be dummy classes that would accept any data in and "predicts" as True
regardless. This shouldn't affect the Docker building, MLFlow integration, FastAPI deployment, etc. There would be a separate patch that adds back the Pytorch example specifically for AIAP's DSP.
sflr, from I'm reading; there are two options:
src/
folder with dummy models and etc. This will be the default generated repo.conda.yaml
, docker/
, src/
by default unless specified otherwise during repo initialisation.I'm leaning towards option 2 because it addresses the user's pain point of starting the repo from scratch. wdyt?
I was thinking more towards option 1 with the model(s) being written (or copied by somewhere else) by the user. conda.yaml
would be just be enough to run the dummy model with no Pytorch, etc. We could do some touch up with the Dockerfile to reduce build times and better script organisation within src
, but I think that would be better in a separate issue/pull request. For this issue, removing the Pytorch example and replacing the files with the example if the repo_name
, description
, etc. matches the prompt given to trigger the use of the example would suffice.
Maybe use pre_gen_project
hook instead of post_gen_project
would be better so that the files that was replaced can also propagate the author_name
, registry_path
, etc. since that would be the inputs that would differ across users. I'm not inclined to add an extra prompt on whether the user wants to use the example, although while writing this, it could still be done if we don't show the example
prompt in the README/guide if none is chosen. We'll discuss this again tomorrow.
With regards to option 2, I think that's more viable when we're moving towards transforming the codebase into CLI-based similar to git as compared to initialising it with Cookiecutter. So for now, I'd go with option 1.
Putting this here before filing it somewhere more relevant:
Testing out RoCM with RX 7900XT yielded some results, of which we would need to change the following in the *-gpu.Dockerfile
:
FROM nvcr.io/nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
to FROM rocm/dev-ubuntu-20.04
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH
to
ENV HIP_VISIBLE_DEVICES 0
ENV LD_LIBRARY_PATH /opt/rocm/lib
RUN bash -c "source activate ${CONDA_ENV_NAME} && pip3 install --force torch==2.2.1+rocm5.7 torchvision==0.17.1+rocm5.7 --index-url https://download.pytorch.org/whl/rocm5.7"
at the endAlso:
--device=/dev/kfd --device=/dev/dri --group-add video
to the docker run
command
Query Brief
The Pytorch code would be pulled out from the codebase and to be downloaded separately and replace the template generation. There should not be any errors if the prompts are filled in correctly that are not to be personalised (Docker registry name, author name, project name, etc.). This is so that the template becomes package-agnostic, and hopes to reduce any confusion following the guide.
Tasks
post-gen-project
hook to include source codes for example problem(s)