Closed greenw0lf closed 4 months ago
Ok, changing the title then
This doesn't seem to be the case. I tested it locally and I got the following output:
{
"activity_name": "whisper_asr",
"activity_description": "Transcribe an audio file using Whisper",
"processing_time_ms": 34902.486083984375,
"start_time_unix": 1721134698120.582,
"parameters": {
"WORD_TIMESTAMPS": true,
"DEVICE": "cpu",
"VAD": true,
"MODEL": "tiny",
"BEAM_SIZE": 5,
"BEST_OF": 5,
"TEMPERATURE": "(0.0,0.2,0.4,0.6,0.8,1.0)"
},
"software_version": "1.0.1",
"input_data": "/data/input-files/testsource__testcarrier/inputfile.wav",
"output_data": "/data/output-files/testsource__testcarrier/transcript/testsource__testcarrier.json",
"steps": []
}
As it can be seen, it takes 34902 ms to process, or ~35s, which checks out. I am not sure what happened on the cluster for the processing time to be so short.
Nevermind, found the issues!
io_util.py
, for obtain_input_file()
, the processing time was not multiplied by 1000 to report in ms instead of s.start_time_unix
instead of sDANE
Python package hasn't been updated to the latest version, which fixes this processing time issue of reporting it in s instead of ms as expected.
I think actually the other way around; we should report everything in ms (at least, that's the standard we set in https://github.com/CLARIAH/DANE/blob/main/dane/provenance.py)