What/where is turkcorpus_{phase}_legacy?

varun-tandon commented 3 years ago

Hi @louismartin, thanks so much for providing such a usable resource and responding to issues. I really appreciate it!

So sorry if this is a simple question: what/where is the turkcorpus_{phase}_legacy, which is referenced in access/evaluation/general.py?

For some context, I'm trying to modify scripts/evaluate.py to evaluate on some of my own data. To do so, I modified general.py in access/evaluation, specifically allowing for a directory parameter for get_prediction_on_turkcorpus and get_prediction_on_turkcorpus.

# My modifications
def get_prediction_on_directory(directory, simplifier, phase):
    source_filepath = get_data_filepath(directory, phase, 'complex')
    pred_filepath = get_temp_filepath()
    simplifier(source_filepath, pred_filepath)
    return pred_filepath

def evaluate_simplifier_on_directory(directory, simplifier, phase):
    pred_filepath = get_prediction_on_directory(directory, simplifier, phase)
    pred_filepath = lowercase_file(pred_filepath)
    pred_filepath = to_lrb_rrb_file(pred_filepath)
    return evaluate_system_output(get_data_filepath(directory, phase, 'simple'),
                                  sys_sents_path=pred_filepath,
                                  metrics=['bleu', 'sari_legacy', 'fkgl'],
                                  quality_estimation=True)

I don't quite understand the first parameter to evaluate_system_output, which you have set to f'turkcorpus_{phase}_legacy'. I attempted to replace this with my own .simple file, but when I try this I get the following error:

Traceback (most recent call last):                                                                                                                                                                                                                                                                                
  File "scripts/evaluate.py", line 28, in <module>
    print(evaluate_simplifier_on_directory('simplification', simplifier, 'test'))
  File "/home/.../general.py", line 47, in evaluate_simplifier_on_directory
    quality_estimation=True)
  File "/home/.../cli.py", line 124, in evaluate_system_output
    orig_sents, refs_sents = get_orig_and_refs_sents(test_set, orig_sents_path, refs_sents_paths)
  File "/home/.../cli.py", line 38, in get_orig_and_refs_sents
    orig_sents = get_orig_sents(test_set)
  File "/home/.../resources.py", line 91, in get_orig_sents
    return read_lines(TEST_SETS_PATHS[(test_set, 'orig')])
KeyError: (PosixPath('/home/.../resources/datasets/simplification/simplification.test.simple'), 'orig')

I've also tried looking for a directory with the turkcorpus_test_legacy name, but to no avail.

Thanks so much for the help, and please let me know if there is any additional information I can provide to clarify my problem.

louismartin commented 3 years ago

Hi @varun-tandon

Thanks for the question! So basically, we use EASSE for evaluation, see this line: https://github.com/facebookresearch/access/blob/7b61fbf0bad665798d662e0a90d2a0e451367df6/access/evaluation/general.py#L8

According to EASSE readme, you can use your custom test set by specifying it in the command line with --test_set custom --orig_sents_path {path_to_source} --refs_sents_paths {paths_to_references} --sys_sents_path {path_to_prediction}.

You can also do it programmatically with the method evaluate_system_output which takes arguments with the same name.

Tell me if that works!

varun-tandon commented 3 years ago

Thanks so much @louismartin! I'll try this out and let you know how it goes 😀

varun-tandon commented 3 years ago

Hi @louismartin your suggestion worked perfectly. Thanks for the help!

facebookresearch / access

What/where is turkcorpus_{phase}_legacy? #29