auroracramer / sonyc-kalman

Latent state models for SONYC data
1 stars 0 forks source link

Latent factor models for imputation and temporal modeling of urban sound data

Environmental noise has been shown to result in adverse effects on quality-of-life in urban environments. Reporting sound is often done at an event-reporting level by individual citizen level, and response by city officials can take up to days in large cities such as NYC. Sounds of NYC (SONYC) is an NYC-focused solution that combines a network of low-cost wireless recording devices to record continuous, real-time audio across the city, providing a time-continuous spatially separated set of audio recordings for further research on urban sound reporting, analysis, and enforcement. Over the lifetime of the SONYC project, over 50 years worth of usable audio data have been collected, but sensor downtime introduces discontinuities and empty segments in the data. Additionally, gaining insight into temporal dynamics of the system can deepen general understanding of the city’s soundscape. To address this, we propose to model the dynamics of the urban soundscape in SONYC data for the following tasks:

Currently, we have convenient access to the SONYC data from the year 2017 across about 40 sensors (of known location) in the form of timestamped (and encrypted) raw-audio (10 second clips, sampled at roughly uniform and widely spaced intervals), deep audio embeddings extracted from these audio clips using models trained on a self-supervised audio-visual correspondence task (known as OpenL3), and predictions from an urban sound tagging model. We propose learning a latent-state dynamical system using Kalman filtering and Kalman-filtering inspired methods, with a primary interest in using a particular method incorporating nonlinear dynamics into the Kalman filtering framework via deep learning such as Deep Kalman Filters [7], Kalman Variational Autoencoders, and other deep latent-state dynamics models. For comparison, we may also investigate using traditional Kalman filters. In our case, the observations will be the OpenL3 embeddings. We propose the following experiments for the given tasks: