QutEcoacoustics / audio-analysis

The audio analysis code (AnalysisPrograms.exe) for the QUT Ecoacoustics Research Group
https://ap.qut.ecoacoustics.info/
Apache License 2.0
52 stars 12 forks source link

Events.beta.csv format is imperfect #506

Open atruskie opened 3 years ago

atruskie commented 3 years ago

Actual behaviour:

The new CSV output format has some problems:

Expected behavior:

The above not to happen.

How to reproduce this bug:

  1. Run a multi recogniser, investigate the results

Additional Details

AP

Version: v21.7.0.4

Some example data: all.txt

towsey commented 3 years ago

Fantastic that you are dealing with this. I have been meaning to log it myself as an issue. Is it possible for events where appropriate to include additional info? For example, for oscillation events to include the oscillation rate and for harmonics to include the interval. And also to include score where one is available.

atruskie commented 3 years ago

Great question.

In short: not really.

CSV is great when all the events have the same shape/type of data. The reason for most of the above issues is we output the results based on the base class, which is EventCommon I think, which lacks the end/low/high properties.

I think, given our nature, it's safe to try and output those extra columns. But if any of that data is missing, we'll get a lot of sparse columns.

But for even more specific events, then we'll definitely end up with a lot of sparse columns. For a recogniser that produces oscillation events, most rows would have the oscillation column filled. But for a multi-recogniser case, most rows would have an empty oscillation column.

To achieve the flexibility we want here, we need to be able to encode arbitrary data structures, which is what the JSON output is for. Each object inside a JSON result can have whatever properties we'd like it to have.

Both of these formats are inefficient for their own reasons, and have strengths over the other.

I think I want to make the CSV useful and dense by default for the common case. And leave the JSON for outputting complex data.

towsey commented 3 years ago

The additional info I would like to add is not complex - i,e, just scalars. It could be done by adding another one or two properties to EventCommon called Score1 and Score2 that would be in addition to the existing Score property. You could then add an event property such as periodicity by assigning it to one of the score fields. The documentation would describe what information was provided in each of the score fields. Trouble is that if we wait for json parsers etc, it will be long time and more difficult for the user.

atruskie commented 2 years ago

Assigning data to columns with generic names is not something we will do. Descriptive names are vital to people understanding what data they're looking at.