ServiceNow / Fast-LLM

Accelerating your LLM training to full speed
https://servicenow.github.io/Fast-LLM/
Other
37 stars 5 forks source link

Improve quickstart guide #49

Open jlamypoirier opened 3 days ago

jlamypoirier commented 3 days ago

✨ Description

Some changes to make the tutorial easier to run (WIP). The goal of the quick start should be to allow running something as fast as possible, and I'm making sure it's the case.

Also removing markdownlint at least for now because it's too annoying. I complains on existing files and doesn't auto-fix errors like the other pre-commit things.

🔍 Type of change

Select all that apply:

tscholak commented 2 days ago

Hi @jlamypoirier,

Thanks for going through the tutorial. I can see you are putting a lot of thought into improving it, but I have to push back on some of the proposed changes.

First, merging the Docker and local environment tabs into one doesn't feel right. These are distinct use cases, and combining them introduces unnecessary complexity and confusion for users. The Docker and local environment guides are already extremely simple. By merging these tabs, we will do both user groups a disservice. We should instead directly support an interactive Toolkit workflow by adding a fifth tab for it.

Second, changing the folder structure to combine inputs and outputs isn't ideal. Keeping inputs and outputs separate is a good practice, and it's how our workflows are designed. Adopting this structure early in the tutorial teaches users the right habits. Consolidating them offers no meaningful benefit and creates unnecessary churn in the documentation.

I also noticed you're suggesting running all commands within Docker. That's problematic. If a user creates folders or files in the Docker container without mounting volumes, those changes are lost when the container shuts down. Moreover, working entirely within Docker restricts users to text editors inside the container, whereas the current guide allows them to use any tools they're comfortable with outside Docker. This change doesn't add value compared to the current setup.

You might not have realized that the tabs are interlocked. When a user selects a tab (e.g., Docker), all other sections automatically switch to that tab as well, making the guide cohesive for each use case. If we have different tabs for each section of the guide then this behaviour is broken, and that makes the guide unnecessarily clunky. The current tab design works and is consistent. Let's keep Docker, local environment, Slurm, Kubernetes, and (new) Toolkit as separate, clearly-defined tabs throughout.

That said, I do like some of your changes. The refinements to the local environment installation instructions are helpful, and surfacing the option to use a truncated dataset earlier in the guide is a good idea. But that doesn't require a separate tab. A simple note before the config YAML preparation section to indicate that a different dataset path can be used for quicker results is enough.

To sum up:

jlamypoirier commented 2 days ago

Let's discuss this in person.

To summarize, here are some the issues that break the tutorial and/or make it more complicated than needed. I tried my best to fix them, and I'm not fully committed to my proposed solutions but these need to be addressed in one way or another.

Concerning the environment tabs:

Concerning the directory structures: we need to simplify things.

Concerning the trial run vs full run: