GoogleCloudPlatform / cloud-profiler-python

Stackdriver Profiler Python agent is a tool that continuously gathers CPU usage information from Python applications
Apache License 2.0
27 stars 23 forks source link

Failed to build the Discovery client for profiler, HttpError 429 #128

Closed kapadias closed 1 year ago

kapadias commented 1 year ago

When attempting to run the cloud profiler through the cloud run service, I encounter the following error: ERROR:googlecloudprofiler.client:Failed to build the Discovery client for profiler (will retry after 19.609s): <HttpError 429 when requesting https://www.googleapis.com/discovery/v1/apis/cloudprofiler/v2/rest returned "Too Many Requests">

Any thoughts what am I missing based on the context files below?

Context pyproject.toml file

python = "^3.10"
Flask = "^2.2.2"
Flask-API = "^3.0.post1"
gunicorn = "^20.1.0"
elasticsearch = "7.9.0"
confuse = "^2.0.0"
google-cloud-profiler = "^4.0.0"
google-auth = "^2.14.1"

App

import traceback
import json
import confuse
import googlecloudprofiler

from flask import jsonify
from flask import request
from flask_api import FlaskAPI

app = FlaskAPI("XXXXXX")

# Profiler initialization. It starts a daemon thread which continuously
# collects and uploads profiles. Best done as early as possible.
try:
    # service and service_version can be automatically inferred when
    # running on App Engine. project_id must be set if not running
    # on GCP.
    googlecloudprofiler.start(verbose=3)
except (ValueError, NotImplementedError) as exc:
    print(exc)  # Handle errors here

@app.route("/XXXXX", methods=["GET", "POST"])
def run():
    """
    Args:
        request (flask.Request): The request object.
        <https://flask.palletsprojects.com/en/1.1.x/api/#incoming-request-data>
    Returns:
        The response text, or any set of values that can be turned into a
        Response object using `make_response`
        <https://flask.palletsprojects.com/en/1.1.x/api/#flask.make_response>.
    """
    request_json = request.get_json(silent=True)
    if request_json["text"]:
        return XXXXX.run(request=request_json, config=config)
    else:
        return {}
if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

Docker


RUN apt-get update && apt-get install -y gcc  \
    && apt-get install -y g++ \
    && rm -rf /var/lib/apt/lists/*

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

# Install production dependencies.
RUN pip install poetry \
    && poetry config virtualenvs.create false \
    && poetry install --no-dev --no-root

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
punya commented 1 year ago

As the error message explains, there were too many requests to the discovery API within a short span of time. This can happen when Cloud Run's autoscaler starts up a large number of instances and each of them try to call the API. Rapid autoscaling is one of the reasons why Cloud Profiler isn't officially supported on Cloud Run.

As a workaround, you could simply ignore this error and rely on the subset of instances that do manage to connect to the API. Unfortunately we aren't able to offer a more robust fix at this time.