NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
589 stars 79 forks source link

GitHub workflows improvements #259

Open sarahyurick opened 1 month ago

sarahyurick commented 1 month ago

There are a couple of GitHub Actions I want to add to NeMo Curator:

After the first one is merged, I can open separate PRs to add the others.

sarahyurick commented 1 month ago

Note from Oliver about 2: " This file didn’t work because using secrets.GITHUB_TOKEN is not permitted to trigger other workflows. GH disallows that to prevent spam. You would need to use your personal access token (PAT). However, since we’re in a public environment, you would need to limit that to a protected branch (via environments). It’s still doable (from branch main, react to all issue change events), but a bit more tricky. As long as the PAT is not exposed to all branches, this is fine. "

sarahyurick commented 1 month ago

Additional tasks outside of GitHub:

sarahyurick commented 1 month ago

NeMo files of interest:

sarahyurick commented 1 month ago

Suggestion from @praateekmahajan: parametrizing the GPU tests with commonly used clients.

Depending on how many GPUs the node has we could we even try a multi-GPU setup.

sarahyurick commented 1 week ago

Regarding the gpuCI label: