bespokelabsai / curator

Synthetic Data curation for post-training and structured data extraction
https://docs.bespokelabs.ai/bespoke-curator
Apache License 2.0
15 stars 2 forks source link

Simple way to cancel submitted batches #47

Open RyanMarten opened 1 week ago

RyanMarten commented 1 week ago

With the default (parallel) backend you can just CTL+C to stop the process of submitting requests.

Because batches are submitted at the start and just monitored, if you want to cancel you need to specifically send a cancel signal to the API.

This would be helpful if the user (for example) meant to truncate an input dataset and sent 100M requests instead of 1M. Or realized that they made an error (maybe by looking at the visualizer) and want to prematurely cancel a large batch to not waste the $ on a run they know won't be what they want.

RyanMarten commented 4 days ago

Example script cancel_batches.py

import argparse
import json
from openai import OpenAI

# python cancel_batches.py --batch_objects_file ~/.cache/curator/005462827e59fa42/batch_objects.jsonl

def parse_args():
    parser = argparse.ArgumentParser(description='Cancel OpenAI batches from a JSONL file')
    parser.add_argument('--batch_objects_file', type=str, help='Path to the JSONL file containing batch objects')
    return parser.parse_args()

def cancel_batches(batch_objects_file):
    # Initialize OpenAI client
    client = OpenAI()

    # Keep track of successes and failures
    cancelled = []
    failed = []

    # Read and process the JSONL file
    with open(batch_objects_file, 'r') as f:
        for line in f:
            batch_obj = json.loads(line.strip())
            batch_id = batch_obj['id']

            try:
                client.batches.cancel(batch_id)
                cancelled.append(batch_id)
                print(f"Successfully cancelled batch: {batch_id}")
            except Exception as e:
                failed.append((batch_id, str(e)))
                print(f"Failed to cancel batch {batch_id}: {str(e)}")

    # Print summary
    print(f"\nSummary:")
    print(f"Successfully cancelled: {len(cancelled)} batches")
    print(f"Failed to cancel: {len(failed)} batches")

    if failed:
        print("\nFailed batches:")
        for batch_id, error in failed:
            print(f"- {batch_id}: {error}")

def main():
    args = parse_args()
    cancel_batches(args.batch_objects_file)

if __name__ == "__main__":
    main()