fluxninja / aperture

Rate limiting, caching, and request prioritization for modern workloads
https://docs.fluxninja.com
Apache License 2.0
614 stars 24 forks source link
caching cloud-native concurrency-limiter kubernetes llm microservices observability rate-limiter scheduler

FluxNinja Aperture
Documentation Reference Slack Community Build Status Go Report Card Godoc Reference

🥷 FluxNinja Aperture

Aperture is a distributed load management platform designed for rate limiting, caching, and prioritizing requests in cloud applications. Built upon a foundation of distributed counters, observability, and a global control plane, it provides a comprehensive suite of load management capabilities. These capabilities enhance the reliability and performance of cloud applications, while also optimizing cost and resource utilization.

Unified Load Management Unified Load Management

Integrating Aperture in your application through SDKs is a simple 3-step process:

Example ```typescript // Tailor policies to get deeper insights into your workload with labels that // capture business context. const labels = { // You can rate limit each user individually. user: "jack", // And have different rate limits for different tiers of users. tier: "premium", // You can also provide the tokens for each request. // Tokens are flexible: LLM AI tokens in a prompt, complexity of a request, // number of sub-actions, etc. tokens: "200", // When peak load exceeds external quotas or infrastructure capacity, // requests can be throttled and prioritized. priority: HIGH, // Get deep insights into your workload. You can slice and dice performance // metrics by any label. workload: "/chat", }; ```
Example ```typescript // Wrap your workload with startFlow and endFlow calls, passing in the // labels you defined earlier. const flow = await apertureClient.startFlow("your_workload", { labels: labels, // Lookup result cache key to retrieve a cached result. resultCacheKey: queryParams, }); // If rate or quota limit is not exceeded, the workload is executed. if (flow.shouldRun()) { // Return a cached result or execute the workload. const cachedResult = flow.resultCache(); const result = await yourWorkload(cachedResult); flow.setResultCache({ value: result, ttl: { seconds: 86400, nanos: 0 }, }); } // ```
Policy YAML ```yaml blueprint: rate-limiting/base uri: github.com/fluxninja/aperture/blueprints@latest policy: policy_name: rate_limit rate_limiter: bucket_capacity: 60 fill_amount: 60 parameters: interval: 3600s limit_by_label_key: user selectors: - control_point: your_workload label_matcher: match_list: - key: tier operator: In values: - premium ```

Rate Limiter Blueprint Rate Limiter Blueprint Rate Limiter Dashboard Rate Limiter Dashboard

In addition to language SDKs, Aperture also integrates with existing control points such as API gateways, service meshes, and application middlewares.

⚙️ Load management capabilities

🏁 Getting Started

☁️ Aperture Cloud

[!NOTE]

FluxNinja has been acquired by CodeRabbit. New sign-ups are temporarily disabled. Existing users can continue to use Aperture Cloud by signing in to their accounts.

The easiest way to try Aperture is to sign up for a free Aperture Cloud account. Aperture Cloud is a fully managed service by FluxNinja. With Aperture Cloud, there's no need to manage any infrastructure, and you can integrate your application with Aperture using SDKs. For more information, refer to the get started guide.

Quota Management Dashboard Quota Management Dashboard Prioritization Metrics for gpt-4 Flow Analytics Flow Analytics Performance Metrics for OpenAI Models

🎮 Local Kubernetes Playground

To try Aperture in a local Kubernetes environment, refer to Playground docs.

📖 Learn More

🎥 Videos

👷 Contributing

Reporting bugs helps us improve Aperture to be more reliable and user-friendly. Include all the required information to reproduce and understand the bug you are reporting. Follow helper questions in the bug report template to make it easier. If you see a way to improve Aperture, use the feature request template to create an issue.

To contribute code, read the Contribution guide.