SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. πŸ”₯ πŸ–₯. πŸ‘‰ Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.89k stars 1.23k forks source link

ZeroDivisionError for system_swap autoinstrumentation for python celery apps #1829

Open abhinavramana opened 1 year ago

abhinavramana commented 1 year ago

Bug description

"Traceback (most recent call last):\n File \"/opt/conda/lib/python3.8/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py\", line 132, in callback\n for api_measurement in callback(callback_options):\n File \"/opt/conda/lib/python3.8/site-packages/opentelemetry/instrumentation/system_metrics/__init__.py\", line 423, in _get_system_swap_utilization\n getattr(system_swap, metric) / system_swap.total,\nZeroDivisionError: division by zero\n"

This error pops up couple of times for python celery applications in opentelemetry logs image

Expected behavior

This exception should not occur, escpecially since this is not an error on the Application side

How to reproduce

Not sure exactly, but sample celery app is present here https://github.com/navyamehta/signoz-template-celery-opentelemetry

Version information

Additional context

These are happening quite frequently image

pranay01 commented 1 year ago

Thanks for sharing the issue @abhinavramana

@srikanthccv Have you come across such an issue earlier?

srikanthccv commented 1 year ago

We fixed this and released it in 0.35b0. Please update the version and let us know if it still exists.

abhinavramana commented 1 year ago

image Seems it still is present @srikanthccv

This is our dependencies

FROM nvcr.io/nvidia/pytorch:22.06-py3

RUN DEBIAN_FRONTEND=noninteractive apt-get -qq update && \
    DEBIAN_FRONTEND=noninteractive apt-get -qqy --no-install-recommends install \
    ffmpeg \
    git \
    libxext6 \
    libsm6 && \
    rm -rf /var/lib/apt/lists/*

ENV GIT_ACCESS_TOKEN "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"
ENV WOMBO_UTILITIES_SCHEMAS "paint_service,mediastore"
RUN git config --global url."https://${GIT_ACCESS_TOKEN}@github.com".insteadOf "ssh://git@github.com"

RUN python -m pip install --upgrade pip && \
    python -m pip install --upgrade \
    dataclasses-json==0.5.5 \
    einops==0.3.2 \
    ftfy==6.0.3 \
    kornia==0.6.6 \
    omegaconf==2.1.1 \
    pillow \
    pytorch-lightning==1.5.10 \
    basicsr==1.3.4.2 \
    pycurl==7.45.1 \
    kombu==5.2.3 \
    celery==5.2.7 \
    git+ssh://git@github.com/womboai/wombo-utilities.git@3.0.1#egg=wombo_utilities \
    google-cloud-translate==2.0.1 \
    opencv-contrib-python==4.5.5.64 \
    fasttext==0.9.2 \
    opentelemetry-distro==0.35b0 \
    opentelemetry-exporter-otlp==1.13.0 \
    opentelemetry-launcher==1.9.0 \
    opentelemetry-instrumentation-celery==0.34b0 \
    opentelemetry-exporter-otlp-proto-grpc==1.13.0 \
    backoff==1.11.1 \
    transformers

# Takes care of all packages needed for open-telemetry, also should be done here to allow caching
RUN opentelemetry-bootstrap --action=install

# DALLE-mini dependencies
# include with caution - these install taming-transformers at the container level
# and override the git submodule taming-transformers (and our fp16 modifications)
#     transformers \
#     dalle_pytorch \
#     triton==0.4.2 \
#     deepspeed

# restricts memory for strategies evaluated in cudnn benchmarking, needed to avoid OOM
ENV CUDNN_CONV_WSCAP_DBG=4096

COPY . /app/
RUN rm -rf /app/gallery/* /app/logs/*
COPY ./cache /root/.cache/
RUN mkdir -p /app/models && touch /app/models/.keep

WORKDIR /app
CMD opentelemetry-instrument --logs_exporter otlp_proto_grpc --traces_exporter otlp_proto_grpc --metrics_exporter otlp_proto_grpc celery -A wombo.celery_paint.celeryapp worker --loglevel=info --pool=threads
srikanthccv commented 1 year ago

Can you share the output of python -m pip freeze?