beeldengeluid / dane-asr-worker

DANE worker for processing ASR (optimised for Dutch)
GNU General Public License v3.0
0 stars 0 forks source link

DANE ASR and download workers add timing + limited prov info to DANE Results #70

Open jblom opened 2 years ago

jblom commented 2 years ago

To prepare for a full provenance chain, it's good to start adding easy-to-obtain prov and timing information to the DANE Results of the ASR worker and the download worker.

Next to the desired provenance model (for informing e.g. researchers) it is very useful to store this information for more precise debugging of the DANE ASR workflow

jblom commented 2 years ago

Update

Now the ASR worker will store the following information in each DANE Result:

    asr_processing_time: float  # retrieved via submit_asr_job()
    download_time: float  # retrieved via dane-beng-download-worker or download_content()
    kaldi_nl_version: str = "Kaldi-NL v0.4.1"  # default for now
    kaldi_nl_git_url: str = (
        "https://github.com/opensource-spraakherkenning-nl/Kaldi_NL"  # default for now
    )

The code has not been tested in a real workflow yet, but has been merged already.