Needed for, e.g., language data with multiple leading dimensions (batch_size, n_tokens, ...). We assume that the last dimension of the model outputs is of size n_outputs (consistent with ASDL); hence, we can flatten all other dimensions into the batch dimension.
Needed for, e.g., language data with multiple leading dimensions
(batch_size, n_tokens, ...)
. We assume that the last dimension of the model outputs is of sizen_outputs
(consistent with ASDL); hence, we can flatten all other dimensions into the batch dimension.