Causality Test - Githubissues

HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training

BSD 2-Clause "Simplified" License

45 stars 5 forks source link

My current best approach would be to initialize a regular model (or optionally layer for unit tests), compute the forward pass, and backpropagate through the loss at one specific position rather than the mean. This way, we can look at the input's gradients and see if any future tokens have a gradient != 0. In a separate test, we could also check if at least one of the past tokens has a gradient != 0 to ensure that the model even looks at the input.\ The issue with this approach is that leaks can be difficult to isolate, and we've already had it multiple times that a leak occurred for a few tokens to other singular tokens but not from all to all. That's why we would need context_size separate backward passes, which can get expensive.

HomebrewNLP / Olmax

Causality Test #78