Only throw a warning if the IGNORE field is mismatched, instead of raising an error.
Description
Allows loading transformers that use a different causal masking strategy (-1e5 rather than -inf). Useful for loading models before the convention change if they don't break with -1e5 (such as mod add models). Should not be used for Pythia.
Prints a warning whenever this occurs.
Motivation and Context
I sometimes need to load old mod add models, want to be able to reproduce results. Backwards compatibility.
How Has This Been Tested?
I can load models now; I checked that mod add does not have large attention scores.
Only throw a warning if the IGNORE field is mismatched, instead of raising an error.
Description
Allows loading transformers that use a different causal masking strategy (-1e5 rather than -inf). Useful for loading models before the convention change if they don't break with -1e5 (such as mod add models). Should not be used for Pythia.
Prints a warning whenever this occurs.
Motivation and Context
I sometimes need to load old mod add models, want to be able to reproduce results. Backwards compatibility.
How Has This Been Tested?
I can load models now; I checked that mod add does not have large attention scores.
Does this PR introduce a breaking change?
No