developmentseed / bioacoustics-api

Google Bioacustics API that runs the backend for A2O Search
https://devseed.com/api-docs/?url=https://api.search.acousticobservatory.org/api/v1/openapi
MIT License
1 stars 0 forks source link

Add time-of-day field to milvus metadata #18

Closed leothomas closed 1 year ago

leothomas commented 1 year ago

The metadata currently contains a timestamp (number of seconds since 1970-01-01T00:00:00), which enables date range searches (eg: recordings from Nov 13th to Dec 16th, or recordings from 10:00 to 10:30 in Nov 14th). Adding an offset in seconds from 00:00:00 would allow searches of the type: all recordings that occur between 8:00 and 9:00, regardless of the date.

leothomas commented 1 year ago

I have come to face, yet again, every developer's mortal enemy: timezones.

The recordings span several different timezones. The best practice for storing data which spans timezones (eg: credit card purchases) is to store the UTC timestamp in the database, and convert the timestamp to the user's local timezone in the frontend code.

This works well for most cases, since it references all local times against a singe timezone (UTC). For example, if I run an online shop from New York and have customers all around the world, I want to be able to query all orders made between 2020-05-30T00:00:00 and 2020-05-31T00:00:00, I just have to convert the start and end datetimes to a UTC timestamp (ie: 2020-05-30T05:00:00 - 2020-05-31T05:00:00, since New York is 5hrs behind UTC) and compare that to all items in the database, which are also stored in UTC! I won't accidentally miss orders from Australia and New Zealand which were made on 2020-05-29T20:00:00 (local time) or include orders made in california 3 hours after the cutoff.

However that doesn't work to find bird calls at sunrise, since sunrise in New York doesn't occur at the same time as sunrise in California. So in order to find all bird calls at sunrise we need to store the time of each recording in its local timezone.

If it weren't for the requirement to store each recording in its own timezone, we would be able to use a modulus operation directly on the UTC timestamp.

In the context of the online shop I would be able to find all orders received between 8am and 9am every day with the condition: 60*60*8 <= TIMESTAMP % (60*60*24) <= 60*60*9.

leothomas commented 1 year ago

Possible metadata configurations:

LanesGood commented 1 year ago

@leothomas noting that timezone conversion, and even worse - Australian daylight savings time!!! -- is apparently the bane of most eco-audiologists as well.

Many Australian regions do observe daylight savings, but some do not. No animals observe daylight savings.

This looks like so much fun, right? From wikipedia

Screen Shot 2023-05-31 at 9 34 59 PM

Is there any sense of the data already being normalized for UTC?

It looks like the A2O acoustic workbench does have some consideration here for audio upload: https://github.com/QutEcoacoustics/audio-analysis/blob/e5756e14227b98d84c8f560333e4160e90a9e1c6/docs/basics/dates.md?plain=1#L26

sdenton4 commented 1 year ago

TODO: Add seconds-past-civil-(twilight/dawn) metadata fields to all recordings. Don't forget the leap seconds! (j/k)

This does sound like a good question for Anthony + Paul.

LanesGood commented 1 year ago

Found the message from A2O I was looking for, from the "Filter by Time of Day" option available on any recording (e.g. https://data.acousticobservatory.org/projects/1/regions/72/points/285/audio_recordings)

image

leothomas commented 1 year ago

Awesome! Thanks @LanesGood. In our case, I only have access to the UTC offset in the filename rather than the timezone itslef, so I wouldn't be able to know wether or not that timezone is currently observing DST. For example, timezone A is UTC+10 when not DST and UTC+11 when observing DST, timezone B is UTC+11 and does not observe DST. If I have a filename with UTC+11, I can't know wether it was recorded in timezone A during DST or in timezone B. I think we should document that limitation

geohacker commented 1 year ago

Time of day is now available as part of the metadata. Thank you @leothomas!