developmentseed / bioacoustics-api

Google Bioacustics API that runs the backend for A2O Search
https://devseed.com/api-docs/?url=https://api.search.acousticobservatory.org/api/v1/openapi
MIT License
1 stars 0 forks source link

504s #28

Closed geohacker closed 1 year ago

geohacker commented 1 year ago

When there are many requests from the frontend at the same time, I'm seeing this in the API logs:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 56, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 55, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/views/generic/base.py", line 103, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/decorators.py", line 50, in handler
    return func(*args, **kwargs)
  File "/code/bioacoustics/milvus/views.py", line 77, in search_view
    'audio_file': request.FILES.get('audio_file'),
  File "/usr/local/lib/python3.9/site-packages/rest_framework/request.py", line 442, in FILES
    self._load_data_and_files()
  File "/usr/local/lib/python3.9/site-packages/rest_framework/request.py", line 279, in _load_data_and_files
    self._data, self._files = self._parse()
  File "/usr/local/lib/python3.9/site-packages/rest_framework/request.py", line 354, in _parse
    parsed = parser.parse(stream, media_type, self.parser_context)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/parsers.py", line 108, in parse
    data, files = parser.parse()
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 123, in parse
    return self._parse()
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 299, in _parse
    for chunk in field_stream:
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 478, in __next__
    output = next(self._producer)
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 615, in __next__
    for bytes in stream:
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 478, in __next__
    output = next(self._producer)
  File "/usr/local/lib/python3.9/site-packages/django/http/multipartparser.py", line 545, in __next__
    data = self.flo.read(self.chunk_size)
  File "/usr/local/lib/python3.9/site-packages/django/http/request.py", line 410, in read
    raise UnreadablePostError(*e.args) from e
django.http.request.UnreadablePostError: [Errno 104] Connection reset by peer

That looks like it's coming from https://github.com/developmentseed/bioacoustics-api/blob/main/bioacoustics/milvus/views.py#L77

cc @willemarcel

geohacker commented 1 year ago

@willemarcel and I were able to look at this together a bit today and like @sunu found the other day, this happens when the Milvus querynodes hits OOM.

We will make the requests from the frontend not drastically high but in the meantime, @sunu let's figure out optimising / increasing resources on the cluster.

sunu commented 1 year ago

Last week I bumped up Milvus querynode's memory limit from 12GB to 15GB to avoid OOMs. But looks like we need even more memory.

@geohacker @willemarcel Do we have an estimate of how much memory it should need? If not, we can try bumping up querynode's memory limit really high and then run some memory intensive queries to observe how much memory it consumes. That should give us a good idea about what the ideal limit should be.

geohacker commented 1 year ago

This is now resolved through better scaling and resource management for Milvus.