developmentseed / bioacoustics-api

Google Bioacustics API that runs the backend for A2O Search
https://devseed.com/api-docs/?url=https://api.search.acousticobservatory.org/api/v1/openapi
MIT License
1 stars 0 forks source link

Melspectrogram support #7

Open geohacker opened 1 year ago

geohacker commented 1 year ago

Google's chirp project has a way to create melspectrograms on the backend https://github.com/google-research/chirp/blob/main/chirp/projects/bootstrap/display.py#L52. Our API should have an endpoint that:

@willemarcel We could use the chirp module directly or just use their spectrogram implementation. Using the module entirely will increase the size of the docker image as it has a large number of ML dependencies.

willemarcel commented 1 year ago

I tried to copy the code from the chirp lib, but it's not working... The args they pass to the class on https://github.com/google-research/chirp/blob/main/chirp/projects/bootstrap/display.py#L41 doesn't match with the Melspectrogram class https://github.com/google-research/chirp/blob/main/chirp/models/frontend.py#L239-L244

I committed my code here: https://github.com/developmentseed/bioacoustics-api/tree/feature/melspectrogram

sdenton4 commented 1 year ago

Hi, Willie! The MelSpectrogram class inherits from Frontend here: https://github.com/google-research/chirp/blob/main/chirp/models/frontend.py#L78 which has 'features' and 'stride' attributes inherited by Melspectrogram. So the call in display.py is equivalent to:

  melspec_layer = frontend.MelSpectrogram(  # pytype: disable=wrong-arg-types  # typed-pandas
      features=96,
      stride=stride,
      kernel_size=2 * stride,
      sample_rate=sample_rate,
      freq_range=(60.0, sample_rate / 2.0),
      scaling_config=frontend.PCENScalingConfig(root=root, bias=0.0),
  )

There's an example of calling the melspec layer on some audio here: https://github.com/google-research/chirp/blob/main/chirp/projects/bootstrap/display.py#L81 (sorry it's slightly baroque)

Are you getting an error or some other bad behavior?