fase-conf / fase-conf.github.io

0 stars 2 forks source link

Submission 67 #2

Open Edkamb opened 2 years ago

Edkamb commented 2 years ago

This issue will be used for the communication concerning the testing phase of submission 67.

ghost commented 2 years ago

The artifact should be evaluated for both the availability and evaluation badge. However, I am not sure if it is even possible to review for the functional badge.

Requires external accounts

The artifact is build to review three commercial ASR tools, each of which can only be accessed by setting up an account. Therefore to evaluate the artifact I have to setup an Google, Microsoft and IBM account. I think that this is not acceptable for any artifact evaluation.

Requires additional cost

While a total estimated cost of 33.23$ could be acceptable for a general replication, I feel that this is out of scope of this evaluation.

Dataset size: Accent: 25-35 s per clip 28 clips <= 980s RAVDESS: 3s per clip 32 clips = 96s Midlands: 3-5s per clip 4 clips <= 20s Nigerian English: 4-6s per clip 4 clips <= 24s

=> Total: 1120s = 19min

Transformations: 5 * 7 = 35 => 653min / around 11h

Cost on GCP: 0.024$ per min => 15.70$ Cost on Azure: 1$ per hour => 11$ Cost on IBM: 0.01$ per min => 6.53$

Solution option

Without setting up accounts and this cost it is not possible to evaluate the scripts GCP_Recog.py, MS_Recog.py and IBM_Recog.py which seem to be central to this approach. One option would be to evaluate based on the transcriptions which could be provided by the authors. However, I am not sure if this is enough to award the evaluation badge. Maybe, @Edkamb can resolve this.

ghost commented 2 years ago

General Review for Testphase

Although there exists a new version on Zenodo, I tested version 1.0 since this was referenced in the abstract.

Installation / Artifact problems

Additional dependencies: The artifact requires the installation of additional dependencies from remote (via pip). This is not allowed: "The artifact evaluation committee will be instructed not to download software or data from external sources." (https://fase-conf.github.io) Please package your artifact together with a virtual environment.

Python version: Please state in the README which Python version is supported.

Executability: While the scripts seem to be executable at first sight, I cannot verify them without the results of GCP_Recog.py, MS_Recog.py or IBM_Recog.py.

academic-starter commented 2 years ago

The artifact is relevant to Paper#67 of FASE2022 and the authors applied for both the evaluation and availability badges.

  1. Availability

The artifact has a DOI link available which contains license file, README file, all experiment dataset and source code. The README file provides a brief introduction to the repo structures, evaluation steps. I appreciate that the authors provide many .ipynb files to analyze and also show partial intermediate results (for Table 4, Figure5) which seems consistent with the paper.

  1. Unsuccessful Evaluation

However, the README file does not provide an easy way to set up the phase of data generation because it requires additional authentication keys to use the service Google Cloud, Microsoft Azure and IBM. The README file does not include sufficient description for the Analysis step. Therefore I couldn’t evaluate the functionality or reusability of this artifact. I would like to recommend only AVAILABILITY badges for this artifact unless the authors provide a better way to help perform the evaluation process.

  1. Solution Suggestion.

I would appreciate it if the authors can provide an easier way to set up their tools (such as providing Docker image containing limited authentication keys and all dependency) and more detailed description of evaluation instructions in the README. I also hope there is an explicit connection between the evaluation artifact and those experiment results in the paper.

academic-starter commented 2 years ago

FASE2022 also provides a virtual machine. The authors could package their tools in the virtual machine including all necessary depedencies.

sparkssss commented 2 years ago

A new version (1.2) has been uploaded to Zenodo under the same DOI link. It has explicit references to the API keys for the 3 commercial ASRs and the python files should be able to use the keys by default.

SHA256 checksum: f56d993a7fde254d1c128d5d5a7384506375fc02525209b45a429080f376e317

We have also included all required python dependencies in a new folder titled "Packages". Please note that we use numpy==1.21.5 due to an incompatibility with another package.

In addition, we have also added folders to make it easier to group all the generated files. We have also added some intermediate results to these folders to show what the expected output should be. This should allow evaluators to run the "compASR.py", "compTableGen.py", "wordDropTableGen.py" on the included results without running the individual recognition functions first. Note: Only a very small subset of intermediate results are included.

The files in the Analysis folder retain the ability to make use of the included intermediate results. Note: These contain all the intermediate files generated in the course of our experiments.

Final Note: We recommend running only a subset of the transformations since each of the individual recognition functions takes a significant portion of time to run.

sparkssss commented 2 years ago

Please refer to the latest version on Zenodo (zenodo.5897347). Additional instructions on usage of python packages is included in README file.

SHA256 checksum: ff75aa5844ee36718683908cf597f64ae378574df162813683cdf9889ee9bd0b