Open jlamypoirier opened 3 days ago
Hi @jlamypoirier,
Thanks for going through the tutorial. I can see you are putting a lot of thought into improving it, but I have to push back on some of the proposed changes.
First, merging the Docker and local environment tabs into one doesn't feel right. These are distinct use cases, and combining them introduces unnecessary complexity and confusion for users. The Docker and local environment guides are already extremely simple. By merging these tabs, we will do both user groups a disservice. We should instead directly support an interactive Toolkit workflow by adding a fifth tab for it.
Second, changing the folder structure to combine inputs and outputs isn't ideal. Keeping inputs and outputs separate is a good practice, and it's how our workflows are designed. Adopting this structure early in the tutorial teaches users the right habits. Consolidating them offers no meaningful benefit and creates unnecessary churn in the documentation.
I also noticed you're suggesting running all commands within Docker. That's problematic. If a user creates folders or files in the Docker container without mounting volumes, those changes are lost when the container shuts down. Moreover, working entirely within Docker restricts users to text editors inside the container, whereas the current guide allows them to use any tools they're comfortable with outside Docker. This change doesn't add value compared to the current setup.
You might not have realized that the tabs are interlocked. When a user selects a tab (e.g., Docker), all other sections automatically switch to that tab as well, making the guide cohesive for each use case. If we have different tabs for each section of the guide then this behaviour is broken, and that makes the guide unnecessarily clunky. The current tab design works and is consistent. Let's keep Docker, local environment, Slurm, Kubernetes, and (new) Toolkit as separate, clearly-defined tabs throughout.
That said, I do like some of your changes. The refinements to the local environment installation instructions are helpful, and surfacing the option to use a truncated dataset earlier in the guide is a good idea. But that doesn't require a separate tab. A simple note before the config YAML preparation section to indicate that a different dataset path can be used for quicker results is enough.
To sum up:
Let's discuss this in person.
To summarize, here are some the issues that break the tutorial and/or make it more complicated than needed. I tried my best to fix them, and I'm not fully committed to my proposed solutions but these need to be addressed in one way or another.
Concerning the environment tabs:
Concerning the directory structures: we need to simplify things.
mkdir ~/inputs ~/results
didn't work in the toolkit job)~/...
or ~/mnt/...`, which complicates thingsConcerning the trial run vs full run:
✨ Description
Some changes to make the tutorial easier to run (WIP). The goal of the quick start should be to allow running something as fast as possible, and I'm making sure it's the case.
fast_llm_tutorial
instead, and mount to /app/fast_llm_tutorial so paths are the same in every environment (not totally sure about kubernetes).Also removing markdownlint at least for now because it's too annoying. I complains on existing files and doesn't auto-fix errors like the other pre-commit things.
🔍 Type of change
Select all that apply: