This enforces uniqueness on --eval_metric.

CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning

Apache License 2.0

25 stars 15 forks source link

This enforces uniqueness on --eval_metric. #190

Closed kylebgorman closed 1 month ago

kylebgorman commented 1 month ago

It also enables SER use for checkpointing and patience, because why not.

kylebgorman commented 1 month ago

This looks good.

I am approving since this doesn't nec. need to be addressed in this PR, but I had meant to have the behavior that if no --eval_metric is requested, we default to accuracy, but if a --eval_metric is requested, that overwrites the default.

Yeah, I may take a stab at that...tbd.

kylebgorman commented 1 month ago

While I have you here we force-log accuracy to W&B. Any reason to change that behavior or make it sensitive to --eval_metric?

Adamits commented 1 month ago

While I have you here we force-log accuracy to W&B. Any reason to change that behavior or make it sensitive to --eval_metric?

Oh! Yes I think we should probably set it to log the --eval_metrics. IIRC, anything we log to lightning will also log to wandb when the callback is on, but this tells it to specifically track the max accuracy. I think it makes sense to do that with any requested eval_metric.

kylebgorman commented 1 month ago

Okay slightly new design. --eval_metrics is now really telling it EXTRA evaluation metrics to compute, beyond the ones it already has to compute for checkpointing or patience. Because of that it now defaults to an empty set, but under ordinary operation it will be using accuracy for checkpointing anyways.

Without this it is easy to request an evaluation metric (for checkpointing or patience) you weren't going to compute otherwise, which crashes.

I will return to the question of logging metric stuff on W&B after I get your input on this. For now I removed it but that's not final.

kylebgorman commented 1 month ago

Okay, I now log the checkpoint metric on W&B too if W&B is enabled. Here are two runs differing only in whether I use SER or accuracy for checkpointing...

kylebgorman commented 1 month ago

Ok so now --eval_metrics are optional additional metrics, but we always default to computing/logging the checkpoint and patience metrics? This makes the most sense to me.

Slight clarification: this logs the appropriate extremum (min or max as appropriate) value for the checkpoint metric. If patience is enabled the patience metric is computed and logged each validation step, though the extremum isn't logged unless it also happens to be same metric as the checkpoint metric. Is that okay?

Adamits commented 1 month ago

Ok so now --eval_metrics are optional additional metrics, but we always default to computing/logging the checkpoint and patience metrics? This makes the most sense to me.

Slight clarification: this logs the appropriate extremum (min or max as appropriate) value for the checkpoint metric. If patience is enabled the patience metric is computed and logged each validation step, though the extremum isn't logged unless it also happens to be same metric as the checkpoint metric. Is that okay?

Ah, right. Yeah that makes sense to me. Thanks for the clarification!