Metadata finalisation with the NDRI/IMOS team

MartinCupak commented 1 month ago

Check with Marty what metadata he wants to have in the output files. Modify the code accordingly

MartinCupak commented 1 month ago

Current metadata in output file header:

example: Metadata extracted from file 502DB01D.flac as JSON: {'numChannels': '1', 'sampleRate': '6000.0', 'durationHeader': '300.0', 'durationFile': '307.0373333333333', 'startTime': '2012-08-17 02:45:01.337646', 'endTime': '2012-08-17 02:50:08.616638', 'scaleFactor': '10000000.0'}

@dataclass
class MetadataEssential:
    numChannels: int = 1
    sampleRate: int = -1
    # audio record duration as read from the DAT file header
    durationHeader: float = 0
    # actual duration of the audio record as stored in a file
    # typically little longer than what is in the DAT file header
    durationFile: float = 0
    startTime: datetime = datetime(1970, 1, 1, tzinfo=timezone.utc)
    endTime: datetime = datetime(1970, 1, 1, tzinfo=timezone.utc)
    scaleFactor: int = -1

MartinCupak commented 1 month ago

Hi @mhidas, could you, please, review the metadata to be written to the output wav/flac files? Currently, the above is included.

There is class MetadataFull defined in IMOSPATools/audiofile.py, which has all that made sense to me, but I'm not sure what all you want top have included.

mhidas commented 1 month ago

Thanks @MartinCupak . I think that's a pretty good start, though if it's straightforward, you might as well include whatever is avaiable out of the MetadataFull class?

Other things that could be good to include:

Name of the raw data file (I think we'll want to rename the calibrated output files to something more meaningful);
Name of the calibration data file used;
Creation timestamp of the output file;

It would also be good to add a couple of details about the overall deployment the recording came from, such as site name & code, deployment name/code, latitude, longitude, water depth, hydrophone depth & serial no., etc... These are generally not included in the raw .DAT file, so would have to be supplied by higher-level code running the processing for a whole deployment. Could the writeMono16bit function (or wherever appropriate) accept as an optional argument a dictionary of additional metadata fields to include?

mhidas commented 1 month ago

These are just my thoughts for now. I'd like to get opinions from some people who are likely to actually use these data. Whenever that happens, we'll revisit this. Until then, I wouldn't put too much effort into updating the metadata in the output files.

MartinCupak commented 4 weeks ago

I have actually somewhat extended the meta data in the final release:

@dataclass
class MetadataFull:
    # Where we get this SetID? database of records?
    # it is in some of the DAT file headers, but not in all of them
    setID: int = -1
    # schedule number seems to be included in many DAT file headers
    schedule: datetime = datetime(1970, 1, 1, tzinfo=timezone.utc)
    numChannels: int = 1
    sampleRate: int = 0
    # audio record duration as read from the DAT file header
    durationHeader: float = 0
    # actual duration of the audio record as stored in a file
    # typically little longer than what is in the DAT file header
    durationFile: float = 0
    startTime: datetime = datetime(1970, 1, 1, tzinfo=timezone.utc)
    endTime: datetime = datetime(1970, 1, 1, tzinfo=timezone.utc)
    # calibration noise level - as provided for calibration
    # -90 seems to be the most common value of calibration noise level
    # 0.0 means not calibrated
    calibNoiseLevel: float = 0.0
    # hydrophone sensitivity - as provided for calibration
    # -196 seems to be the most common value of hydrophone sensitivity
    hydrophoneSensitivity: float = -196
    scaleFactor: int = -1

Example:

Metadata extracted from file Set1234_20120817_024501.wav as JSON:

{'setID': '3154', 'schedule': '2012-08-17 02:45:01.322479', 'numChannels': '1', 'sampleRate': '6000.0', 'durationHeader': '300.0', 'durationFile': '307.0373333333333', 'startTime': '2012-08-17 02:45:01.337646', 'endTime': '2012-08-17 02:50:08.374979', 'calibNoiseLevel': '-90.0', 'hydrophoneSensitivity': '-197.8', 'scaleFactor': '10000000.0'}

ADACS-Australia / NDRI-IMOS

Metadata finalisation with the NDRI/IMOS team #8