QutEcoacoustics / egret

Everyone's Generalized REcognizer Tester
Apache License 2.0
3 stars 1 forks source link

Read and write Audacity project files (metadata tags and labels only) #1

Open cofiem opened 3 years ago

cofiem commented 3 years ago

This project is quite interesting!

I've been working with audio files and Audacity, as well as some custom audio analysis.

I thought it would be useful to be able to read and write Audacity project .aup files. Only the Label Tracks and project Metadata tags. Note that labels are limited to 260 characters.

Goal

The key outcome from this would be that Audacity could be used as a visual editor for audio labels, which might make it easier for people to view output results and create the input expected results.

Details

My idea is that egret could write Audacity project files with metadata tags and label tracks, using the name attribute for the suite and tool information, and the title attribute for individual result information. Possibly separated by semicolons. These Audacity project files would contain only the label tracks. Then the project and any audio file(s) could be opened in Audacity, and it would be possible to look at the audio with the labels in Audacity.

If egret can also read the tag and labels from an Audacity project file, then the labels could be created in Audacity and used as input for analysis tools. An Audacity project file could be specified in the expect - where it has label and bounds in the configuration, that could instead be something like audacity_project_file and a path to the file.

There are all sorts of way this could be extended, but I think a good starting point is writing results from egret as Audacity project files with labeltracks and labels, and reading an Audacity file and using all labels from all labeltracks as the input to the expected results configuration. There could also be custom tags when writing the file to record the overall information from an egret run.

This is how the label tracks look with a stereo audio track (from here): Audio track with label tracks

The .aup file is XML. Here is a sample file, this file could be both input as the expected results, and output, as the results from an analysis run by egret:

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE project PUBLIC "-//audacityproject-1.3.0//DTD//EN" "http://audacity.sourceforge.net/xml/audacityproject-1.3.0.dtd" >
<project xmlns="http://audacity.sourceforge.net/xml/" projname="test-labels_data" version="1.3.0" audacityversion="2.4.2" sel0="15.9714285714" sel1="24.4400000000" selLow="10.0000000000" selHigh="10000.0000000000" vpos="0" h="0.0000000000" zoom="71.6039279869" rate="44100.0" snapto="off" selectionformat="hh:mm:ss + milliseconds" frequencyformat="Hz" bandwidthformat="octaves">
    <tags>
        <tag name="ARTIST" value="artist"/>
        <tag name="TITLE" value="track"/>
        <tag name="COMMENTS" value="comments"/>
        <tag name="ALBUM" value="album"/>
        <tag name="YEAR" value="year"/>
        <tag name="TRACKNUMBER" value="track number"/>
        <tag name="GENRE" value="genre"/>
        <tag name="Custom" value="Custom metadata tag"/>
    </tags>
    <labeltrack name="Track 2" isSelected="1" height="206" minimized="0" numlabels="2">
        <label t="4.0000000000" t1="7.2446258503" selLow="1.0000000000" selHigh="10.0000000000" title="test 1"/>
        <label t="15.9714285714" t1="24.4400000000" selLow="10.0000000000" selHigh="10000.0000000000" title="test 3"/>
    </labeltrack>
    <labeltrack name="Track 1" isSelected="1" height="90" minimized="0" numlabels="1">
        <label t="8.0228571429" t1="18.9800000000" title="test 2"/>
    </labeltrack>
</project>

Here's what this project file looks like in Audacity: image

Next Steps

What do you think?

If you're keen, I'm willing to have a go at implementing this.

atruskie commented 3 years ago

The key outcome from this would be that Audacity could be used as a visual editor for audio labels, which might make it easier for people to view output results and create the input expected results.

Beautiful. I have currently implemented reading from AviaNZ labels files and recursive imports for egret labels.

Raven was next on my list; Audacity was planned too.

I did not consider writing results to AUP files though.

Something I'll have to document later is that some of the output formats won't output all possible information. The CSV output is a good example (only outputs summary stats). But that also means a new outputter is not required to output everything either (maybe only outputting results).

Also, something to consider: each result is uniquely identifiable by a tuple of (suite, tool, type). You can see that in the CSV formatter where we flatten the results into one enumerable:

https://github.com/QutEcoacoustics/egret/blob/c01f5111fb05813fbe6413d5a4cd56a06b5c822d/src/Egret.Cli/Formatters/CsvResultFormatter.cs#L99-L110

From that, there are some open questions:

If egret can also read the tag and labels from an Audacity project file, then the labels could be created in Audacity and used as input for analysis tools. An Audacity project file could be specified in the expect - where it has label and bounds in the configuration, that could instead be something like audacity_project_file and a path to the file.

Okay, follow the importer model here. Look at the AviaNZ importer and the EgretImporter. Both are still incomplete, but an audacity importer was definitely planned.


I'm very happy for you to have a go.

I would suggest a three phase approach:

  1. [ ] Implement pure models (ideally C# 9 records) for the .aup file formatter and an associated serializer.
  2. [ ] Implement an audacity importer first
    • LabelTracks should produce TimeExpectations
    • And error if spectral overrides are set
  3. [ ] Implement the formatter last

I think with this approach, easily reading and writing data will make everything easier. Reading as expectations is the least novel idea and should fit well in with other importers - I made egret to import things - so any impedence between the audacity format and the egret structures can be worked out (e.g. how does sel[Low|High] effect the import?).

With those two in place it should be much simpler to do the writer (and a complete story, read, test, write).

cofiem commented 3 years ago

A couple of notes:

In the project XML element, the attributes sel0 and sel1 are the current time range selection. And selLow and selHigh are the frequency bounds of the current selection.

In the label XML element, the attributes selLow and selHigh are the frequency range of the label.

All sel* attributes are in Hz.

Also note that Audacity seems to fail to load .aup files saved with UTF8 BOM. Make sure the .aup files do not have the Byte Order Mark.