[Feature]: To remove Pytorch example and separate it as an example section in the documentation

Syakyr commented 9 months ago

Query Brief

The Pytorch code would be pulled out from the codebase and to be downloaded separately and replace the template generation. There should not be any errors if the prompts are filled in correctly that are not to be personalised (Docker registry name, author name, project name, etc.). This is so that the template becomes package-agnostic, and hopes to reduce any confusion following the guide.

Tasks

[x] Include post-gen-project hook to include source codes for example problem(s)
- CV problem w. mnist dataset
[x] Write up base template code
[x] Change the guide site since it's different for each problem statement
[x] Test the base template
[x] Write up the underlying changes made with PR #24 on top of the feature that is to be implemented from this issue tracker

deonchia commented 9 months ago

@Syakyr, does this mean that we'll have to refactor the docs to include relevant code chunks for each problem?

Syakyr commented 9 months ago

@Syakyr, does this mean that we'll have to refactor the docs to include relevant code chunks for each problem?

Doc refactoring should be minimal. The code that is replacing Pytorch as the default would be dummy classes that would accept any data in and "predicts" as True regardless. This shouldn't affect the Docker building, MLFlow integration, FastAPI deployment, etc. There would be a separate patch that adds back the Pytorch example specifically for AIAP's DSP.

deonchia commented 8 months ago

sflr, from I'm reading; there are two options:

Fill in with a dummy src/ folder with dummy models and etc. This will be the default generated repo.
Initialise the repo without any conda.yaml, docker/, src/ by default unless specified otherwise during repo initialisation.

I'm leaning towards option 2 because it addresses the user's pain point of starting the repo from scratch. wdyt?

Syakyr commented 8 months ago

I was thinking more towards option 1 with the model(s) being written (or copied by somewhere else) by the user. conda.yaml would be just be enough to run the dummy model with no Pytorch, etc. We could do some touch up with the Dockerfile to reduce build times and better script organisation within src, but I think that would be better in a separate issue/pull request. For this issue, removing the Pytorch example and replacing the files with the example if the repo_name, description, etc. matches the prompt given to trigger the use of the example would suffice.

Maybe use pre_gen_project hook instead of post_gen_project would be better so that the files that was replaced can also propagate the author_name, registry_path, etc. since that would be the inputs that would differ across users. I'm not inclined to add an extra prompt on whether the user wants to use the example, although while writing this, it could still be done if we don't show the example prompt in the README/guide if none is chosen. We'll discuss this again tomorrow.

With regards to option 2, I think that's more viable when we're moving towards transforming the codebase into CLI-based similar to git as compared to initialising it with Cookiecutter. So for now, I'd go with option 1.

Syakyr commented 8 months ago

Putting this here before filing it somewhere more relevant:

Testing out RoCM with RX 7900XT yielded some results, of which we would need to change the following in the *-gpu.Dockerfile:

FROM nvcr.io/nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04 to FROM rocm/dev-ubuntu-20.04

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH

to

ENV HIP_VISIBLE_DEVICES 0
ENV LD_LIBRARY_PATH /opt/rocm/lib

Add RUN bash -c "source activate ${CONDA_ENV_NAME} && pip3 install --force torch==2.2.1+rocm5.7 torchvision==0.17.1+rocm5.7 --index-url https://download.pytorch.org/whl/rocm5.7" at the end

Also:

Add --device=/dev/kfd --device=/dev/dri --group-add video to the docker run command

aisingapore / kapitan-hull

[Feature]: To remove Pytorch example and separate it as an example section in the documentation #17

Query Brief

Tasks