How to evaluate a model based on its checkpoint

Hi there,

Firstly, congratulations on the great work and the publication of this very useful library!

I'm encountering some bug while attempting to execute the eval.py script. I've successfully trained the NRMS model for 20 epochs. However, my machine disconnected before the automatic testing phase.

I trained the model using the configuration provided in nrms_mindsmall_pretrainedemb_celoss_bertsent.yaml.

Now, I'm attempting to execute the eval.py script with the eval.yaml file defined as shown below. However, when I run the command python eval.py, I receive the following error:

An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/home/jovyan/shared/igor/pprec-imp/newsreclib/eval.py", line 93, in <module>
    main()
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/utils.py", line 302, in run_and_report
    raise ex
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/_internal/hydra.py", line 119, in run
    ret = run_job(
          ^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/hydra/core/utils.py", line 116, in run_job
    output_dir = str(OmegaConf.select(config, job_dir_key))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/omegaconf.py", line 682, in select
    format_and_raise(node=cfg, key=key, value=None, cause=e, msg=str(e))
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/_utils.py", line 899, in format_and_raise
    _raise(ex, cause)
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/_utils.py", line 797, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set env var OC_CAUSE=1 for full trace
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/omegaconf.py", line 674, in select
    return select_value(
           ^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/_impl.py", line 58, in select_value
    node = select_node(
           ^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/_impl.py", line 93, in select_node
    _root, _last_key, node = cfg._select_impl(
                             ^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 531, in _select_impl
    value = root._maybe_resolve_interpolation(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 719, in _maybe_resolve_interpolation
    return self._resolve_interpolation_from_parse_tree(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 584, in _resolve_interpolation_from_parse_tree
    resolved = self.resolve_parse_tree(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 764, in resolve_parse_tree
    return visitor.visit(parse_tree)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
           ^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 206, in accept
    return visitor.visitConfigValue(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar_visitor.py", line 101, in visitConfigValue
    return self.visit(ctx.getChild(0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
           ^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 342, in accept
    return visitor.visitText(self)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar_visitor.py", line 301, in visitText
    return self._unescape(list(ctx.getChildren()))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar_visitor.py", line 389, in _unescape
    text = str(self.visitInterpolation(node))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar_visitor.py", line 125, in visitInterpolation
    return self.visit(ctx.getChild(0))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/antlr4/tree/Tree.py", line 34, in visit
    return tree.accept(self)
           ^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar/gen/OmegaConfGrammarParser.py", line 921, in accept
    return visitor.visitInterpolationNode(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/grammar_visitor.py", line 158, in visitInterpolationNode
    return self.node_interpolation_callback(inter_key, self.memo)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 745, in node_interpolation_callback
    return self._resolve_node_interpolation(inter_key=inter_key, memo=memo)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/shared/igor/envs/newsreclib_env/lib/python3.11/site-packages/omegaconf/base.py", line 676, in _resolve_node_interpolation
    raise InterpolationKeyError(f"Interpolation key '{inter_key}' not found")
omegaconf.errors.InterpolationKeyError: Interpolation key 'logger.wandb.name' not found
    full_key: hydra.run.dir
    object_type=dict

Do you have any insights into why this might be happening?

eval.yaml

# @package _global_

defaults:
  - _self_
  - data: null # choose datamodule with `test_dataloader()` for evaluation
  - model: null
  - logger: many_loggers.yaml
  - trainer: default.yaml
  - paths: default.yaml
  - extras: default.yaml
  - hydra: default.yaml

task_name: "eval"

tags: ["eval"]

# passing checkpoint path is necessary for evaluation
ckpt_path: logs/train/runs/nrms_mindsmall_pretrainedemb_celoss_bertsent_s42/2024-03-08_07-48-40/checkpoints/last.ckpt

This description outlines the issue I'm facing while attempting to execute the eval.py script after training the NRMS model. Any help or suggestions would be greatly appreciated!

Thank you!

Hi,

Thank you for your interest in the library and for reaching out. Since you used a specific experiment configuration for training in which the default values from the specific modules (e.g., data, model etc.) are overridden, you need to specify the same experimental setup also in evaluation. In the default eval.yaml configuration file, there are currently no default data or model configurations specified, hence the errors. You can easily fix this with the following command:

python eval.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent.yaml

You also have two options to specify the checkpoint:

By modifying the _ckptpath in the eval.yaml file; however, this means you need to change the checkpoint path for each new run of the eval.py function.
Encouraged: by appending the checkpoint path to the function call, as:

python eval.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent.yaml ckpt_path=YOUR_CKPT_PATH

I hope this solves your problem, and please let me know if you have further questions.

Thank you very much for your response @andreeaiana!

After running the command you provided, I faced some issues with the eval.yaml file. To address this, I've submitted a pull request (https://github.com/andreeaiana/newsreclib/pull/8) that includes a bug fix and additional documentation for the MINDlarge dataset, along with a sample configuration for it (nrms_mindlarge_pretrainedemb_celoss_bertsent.yaml).

In addition to the PR, I would like to highlight two observations from my experimentation that I believe could enhance the user experience and accuracy of the evaluation process:

As depicted in the attached image, there's a progress indicator (red square) during the evaluation phase. However, once this loading bar finishes achieving 2288/228, the console seems to be "frozen" for an extended period, initially suggesting a bug. It turned out to be the time required to compute the calculation of all the scores (blue square). I propose introducing a progress bar for this scoring calculation phase. That is, a loading bar showing that auc, categ_div@k, categ_pers@k, etc are being calculated in order to clarify that the process is ongoing and not stalled. Do you think this enhancement would be something worth implementing?
In several of my test runs, I've noticed identical values for ndcg@5 and ndcg@10 as you can see on the attached image. Have you experienced similar outcomes in your tests?

If you think these ideas are worth looking into, I'd be really keen to dive in and work on improving them!

After running the command you provided, I faced some issues with the eval.yaml file. To address this, I've submitted a pull request (#8) that includes a bug fix and additional documentation for the MINDlarge dataset, along with a sample configuration for it (nrms_mindlarge_pretrainedemb_celoss_bertsent.yaml).

Many thanks for the bug fix and extra documentation @igor17400, I accepted your PR!

In addition to the PR, I would like to highlight two observations from my experimentation that I believe could enhance the user experience and accuracy of the evaluation process:

As depicted in the attached image, there's a progress indicator (red square) during the evaluation phase. However, once this loading bar finishes achieving 2288/228, the console seems to be "frozen" for an extended period, initially suggesting a bug. It turned out to be the time required to compute the calculation of all the scores (blue square). I propose introducing a progress bar for this scoring calculation phase. That is, a loading bar showing that auc, categ_div@k, categ_pers@k, etc are being calculated in order to clarify that the process is ongoing and not stalled. Do you think this enhancement would be something worth implementing?

I agree with you, an additional progress bar for the scoring calculation phase would be a lot more informative.

In several of my test runs, I've noticed identical values for ndcg@5 and ndcg@10 as you can see on the attached image. Have you experienced similar outcomes in your tests?

I have never encountered this issue before, for all my experiments ndcg@5 and ndcg@10 are always different, with the former being lower than the latter.

If you think these ideas are worth looking into, I'd be really keen to dive in and work on improving them!

Thanks for proposing these enhancements, they would definitely improve the current functionality. It would be great if you could work on them :)

Great @andreeaiana! I'll start working on those feature now 😊

I just executed the evaluation of nrms_mindsmall_pretrainedemb_celoss_bertsent again and obtained the exactly same scores for ndcg@5 and ndcg@10 as it can be seen on the attached image below. I'm trying to understand why is this happening.

I believe I found the reason. The setup.py file specifies an old version of torchmetrics (0.11.4). After updating it to version 1.3.1, the metrics are now being computed correctly.

I believe I found the reason. The setup.py file specifies an old version of torchmetrics (0.11.4). After updating it to version 1.3.1, the metrics are now being computed correctly.

Thanks for letting me know, I updated the setup.py, requirements.txt, and environment.yaml file with the latest version of torchmetrics.

andreeaiana / newsreclib

How to evaluate a model based on its checkpoint #7