Open thomasnguyen92 opened 9 months ago
Hi @thomasnguyen92
Thank you for your interest in our work!
The test data is not public yet. We'll update the repo when we make it public (soon). Until then, if you'd like the test set evaluated, you can follow the instructions here.
Can you point out the specific step/script that gave you trouble because of the symbolic links?
I can provide the steps as i faced the same problem
following the steps as mentioned after python slue_toolkit/prepare/prepare_voxpopuli_nel.py create_manifest
when you run the cmd bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base
you would get the following error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 198, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 347, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 107, in run
return run_job(
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 129, in run_job
ret.return_value = task_function(task_cfg)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/hydra_train.py", line 27, in hydra_main
_hydra_main(cfg)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/hydra_train.py", line 56, in _hydra_main
distributed_utils.call_main(cfg, pre_main, **kwargs)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/distributed/utils.py", line 404, in call_main
main(cfg, **kwargs)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq_cli/train.py", line 134, in main
task.load_dataset(valid_sub_split, combine=False, epoch=1)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/tasks/audio_finetuning.py", line 140, in load_dataset
super().load_dataset(split, task_cfg, **kwargs)
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/tasks/audio_pretraining.py", line 153, in load_dataset
self.datasets[split] = FileAudioDataset(
File "/content/slue-toolkit/baselines/ner/e2e_scripts/fairseq/fairseq/data/audio/raw_audio_dataset.py", line 269, in __init__
with open(manifest_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/content/manifest/slue-voxpopuli/e2e_ner/dev.tsv'
even if the file exists as a symbolic link.
I attempted to replicate a Named Entity Recognition (NER) experiment but encountered several issues during the process.
Firstly, when executing the command
python slue_toolkit/prepare/prepare_voxpopuli_nel.py create_manifest
to generate manifest files, I noticed that thedev.tsv
,fine-tune.tsv
, andtest.tsv
files were merely symbolic links. They were unusable for running the end-to-end NER model. To resolve this, I had to manually copydev.tsv
andfine-tune.tsv
fromslue-toolkit/manifest/slue-voxpopuli
into thee2e_ner
directory.Additionally, I faced a problem while performing evaluations with the command
bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base test combined nolm
. It appears that the processed test files are missing. Could you provide guidance on how to properly prepare these files for evaluation?