TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis
https://stumpy.readthedocs.io/en/latest/
Other
3.67k stars 320 forks source link

Add GPU Runners for Github Actions #1005

Closed seanlaw closed 1 month ago

seanlaw commented 4 months ago

At the SciPy 2024 conference, I learned that free GPU runners are available via the Quansight's/MetroStar's "Open GPU Server". We may consider using this in the future

joehiggi1758 commented 2 months ago

@seanlaw hey Sean - hope you had a wonderful weekend!

Scouting for my next contribution and this one looks interesting! What would next steps look like?

seanlaw commented 2 months ago

@joehiggi1758 This one is less about code contribution and more about "what are the concrete steps for getting access to a GPU so that we can execute our GPU unit tests?"

If you'd like to help us answer this question and come up with a plan then that would be super helpful. Frankly, it's not entirely clear what is being offered above and whether or not it is even useful for what we need. It could very well be a "nope, it's not quite what we need" and we move on. So this is a fact finding mission.

joehiggi1758 commented 2 months ago

@seanlaw ahh okay got it - so more of an open ended task at this point, to formalize a plan!

I'm happy to help here and will get us a plan/framework put together!

seanlaw commented 2 months ago

Awesome! Thank you for your willingness to take on this ill-formed/ambiguous task

joehiggi1758 commented 2 months ago

@seanlaw hope you're having a wonderful evening! Here's a high level plan/write up on the above! Please let me know if this is not the direction you hoped for and I'm happy to pivot!

High-Level Overview: Quansight and MetroStar's "Open GPU Server" is an initiative designed to provide GPU resources for continuous integration purposes, particularly for the conda-forge community.

Regarding hardware available...

  1. Read and agree, if agreeable, to terms and conditions listed at open-gpu-server/TOS.md
  2. Open PR to add STUMPY to open-gpu-server/access/conda-forge-users.json
  3. Configure GitHub Actions workflow to use GPU runners
  4. Specify the appropriate runner labels in the GitHub Actions workflow configuration file (i.e:
    
    name: CI with GPU

on: [push, pull_request]

jobs: build: runs-on: gpu_large # Specification of GPU runner ...


5. Open a STUMPY pull request
6. Merge GitHub Actions workflow to main
seanlaw commented 2 months ago

Thanks @joehiggi1758 Did anything in the TOS catch your eye and that might be problematic?

From a hardware standpoint, I think gpu_tiny or, at most, gpu_medium should be sufficient for our needs. We're primarily interested in testing the GPU code. I'm thinking that we add a .github/workflows/gpu.yml workflow that performs the GPU unit tests only (via ./test.sh gpu)

joehiggi1758 commented 2 months ago

@seanlaw of course - happy to help!

The two sections that caught my eye were...

The first statement reads to me, "we can capture some data about your use of our GPUs and store it, and we will anonymize it" and the second reads to me as, "we reserve the right to work with third party vendors - which you agree to their terms/risks by using our GPUs".

I believe both of these statements to be fine, and relatively standard for our purposes, but I'm not sure if you play by different rules about exposing data or agreeing to terms relating to external entities, given that you're under the TDAmeritrade/Scwhab umbrella.

Also - that makes sense on the hardware front, should we open an issue to write that '.yml'?

seanlaw commented 2 months ago

@joehiggi1758 I agree. Aside from Github (password) secrets, everything else is open information since we are fully open source. I think we are okay to move forward.

As you wrote above, I think the first step is to "agree" and add STUMPY to `open-gpu-server/TOS.md. Would you mind doing this first (feel free to tag me in that PR/issue) and, after we get the green light, we then come back to the Github workflow. How does that sound?

joehiggi1758 commented 2 months ago

@seanlaw of course - I'm on it!

joehiggi1758 commented 2 months ago

@seanlaw we've been merged into main for access to Quansight's GPU's!

Want me to open an issue for a GPU workflow?

seanlaw commented 2 months ago

Want me to open an issue for a GPU workflow?

Open an issue or a PR? Are there examples where others have done this successfully? Is there an intermediate step that might allow us to test things out (i.e., test out our access)?

My concern is that we'll need to make a bunch of PRs here in this repo in order to test (rather than a single PR or maybe a couple) and I'd like to avoid that if possible.

joehiggi1758 commented 2 months ago

@seanlaw hey Sean - hope you had a wonderful weekend!

As a plan of attack, first to assist with access testing I have requested to be added to Quansight's open GPU server here, second, I will test access locally and let you know what I find out! Does that work?

seanlaw commented 2 months ago

Does that work?

Sounds good!

jaimergp commented 2 months ago

FWIW, that access list is only for conda-forge repositories, not general usage. So far we haven't offered access to the resources outside conda-forge.

seanlaw commented 2 months ago

@jaimergp Can you further explain what that means and what we can/can't do? STUMPY has a conda-forge feedstock but that only gets triggered when we bump the latest PyPI version. What we'd like to do to is to run our GPU unit tests as a new PR/commit comes in

jaimergp commented 2 months ago

Ah, sorry, I didn't see any mentions of conda-forge in this ticket so I incorrectly assumed you were trying to add the server CI directly here, not in your stumpy feedstock. Apologies.

When you modify the recipe in the feedstock, add the necessary tests, but there's no need to test the whole suite.

seanlaw commented 2 months ago

@jaimergp No need to apologize at all and we appreciate your help! From your description, it sounds like adding the recipe to the conda feedstock would mean that the underlying package, STUMPY, has already been loaded into PyPI? Our current process is:

  1. Change and commit code to the STUMPY codebase
  2. Repeat Step 1 until a new version is ready to be released
  3. Release latest version to PyPI
  4. Update conda-forge feedstock to pick up latest version from PyPI and release the latest version

However, we would like to run our GPU tests in Step 1 as new changes/commits (to our GPU code) occur and NOT after a new version is released to PyPI (by that time, it is too late to catch any GPU bugs/errors).

Maybe I'm misunderstanding the point of accessing this GPU resource? What is the primary use case?

jaimergp commented 2 months ago

Exactly, the GPU resource are only available during (4). The primary use case is for redistribution QA. Making sure we have compiled things in the right way and asserting they would install and work correctly in end users machines.

For day-to-day development I'm afraid our server is insufficient to meet the general demands. You may look into https://docs.gha-runners.nvidia.com/ or the other solutions discussed at https://github.com/zarr-developers/zarr-python/issues/2041.

seanlaw commented 2 months ago

Thanks for confirming @jaimergp and for sharing alternative options! We will need to investigate if this is worth it. We don't have any funding so "free" is what we are looking for.

seanlaw commented 1 month ago

Closing this for now and may revisit in the future.