Lilypad-Tech / lilypad

Run AI workloads easily in a decentralized GPU network. https://www.youtube.com/watch?v=yQnB2Yxia4Y
https://lilypad.tech
Apache License 2.0
52 stars 16 forks source link

feat: Add solver rate limiter #419

Closed bgins closed 2 weeks ago

bgins commented 2 weeks ago

Summary

This pull request makes the following changes:

We would like to limit the number of requests from an IP address by route.

Task/Issue reference

Closes: #417

Test plan

Start the chain and solver nodes. The other parts of stack will not be needed for testing.

Copy this script into a push-limits.go file:

package main

import (
    "fmt"
    "io"
    "net/http"
    "os"
    "sync"
    "time"
)

func main() {
    configs := map[string]struct {
        path         string
        initialDelay int // in milliseconds
    }{
        "resource_offers": {path: "/api/v1/resource_offers", initialDelay: 0},
        "job_offers":      {path: "/api/v1/job_offers", initialDelay: 1000},
        "deals":           {path: "/api/v1/deals", initialDelay: 2000},
    }

    var wg sync.WaitGroup
    var mu sync.Mutex

    // Send off callers to run concurrently
    for _, config := range configs {
        wg.Add(1)

        go func() {
            defer wg.Done()
            makeCalls(config.path, config.initialDelay, &mu)
        }()
    }

    wg.Wait()
}

func makeCalls(path string, initialDelay int, mu *sync.Mutex) {
    // Wait a bit to stagger the callers
    time.Sleep(time.Duration(initialDelay) * time.Millisecond)

        // Make 10 requests
    for i := range 10 {
        requestURL := fmt.Sprintf("http://localhost:%d%s", 8080, path)
        res, err := http.Get(requestURL)

        if err != nil {
            fmt.Printf("get request failed on %s: %s\n", path, err)
            os.Exit(1)
        }

        // Ensure full result printed in order
        mu.Lock()
        printResult(path, res, i)
        mu.Unlock()

        // Wait before making next call
        time.Sleep(300 * time.Millisecond)
    }

}

func printResult(path string, res *http.Response, count int) {
    fmt.Printf("path: %v\n", path)
    fmt.Printf("status code: %d\n", res.StatusCode)
    fmt.Printf("count: %d\n", count+1)

    if res.StatusCode == 429 {
        resBody, err := io.ReadAll(res.Body)
        if err != nil {
            fmt.Printf("could not read response body: %s\n", err)
            os.Exit(1)
        }

        fmt.Printf("error body: %s\n", resBody)
    } else {
        fmt.Println()
    }
}

This script runs 10 requests across our get resource offer, job offer, and deals endpoints. The makeCalls for each endpoint are staggered so they can be viewed independently in the output.

Run the script with go run push-limits.go.

The expected output is successful calls to get resource offers at first:

path: /api/v1/resource_offers
status code: 200
count: 1

We default to five requests allowed over 10 seconds. Once an endpoint has reached it's limit, it should report 429s:

path: /api/v1/resource_offers
status code: 429
count: 6
error body: Too Many Requests

The outputs for the job offer and deals endpoints will be interleaved, but should demonstrate that the rate limit is per endpoint and not global.

Test the rate limiting configuration by starting the solver with CLI options or environment variables:

./stack solver --server-rate-request-limit 1 --server-rate-window-length 5
SERVER_RATE_REQUEST_LIMIT=1 SERVER_RATE_WINDOW_LENGTH=5 ./stack solver

Run the push-limits.go script again and the limits should be enforced much sooner.

Details

We considered rate limiting by wallet address, but have decided to use IP address for a first pass. Some endpoints do not require our X-Lilypad-User header, so wallet address is not sufficient. We may want to revisit this idea in the future.

In addition to httprate, we also considered tollbooth. tollbooth is more widely used, but httprate has a better algorithm based on work at Cloudflare. Also, httprate supports Redis for tracking counts over multiple server instances.

The SERVER_RATE_REQUEST_LIMIT and SERVER_RATE_WINDOW_LENGTH environment variables have been added to all Doppler solver envinronments with the default values. We can update these values and restart the solver to tune them.

bgins commented 2 weeks ago

Would it be possible to make push-limits a test so we can run them on PRs to make sure we don't regress? (if it's a big lift, we can add a task on the backlog)

Yeah, great idea! Adapted it into a test here: https://github.com/Lilypad-Tech/lilypad/pull/419/commits/0e8a7f3e3d0d98e04a2faf01044c6801838c8b4a