Open KevinGoodman opened 1 year ago
I'm also confuse about it. And in L376c = torch.cat((cond_tokens, t_tokens), dim=-2)
, why concat t tokens and cond tokens in time dimension?
@xuzheyuan624 it just concatenates the music condition and the time tokens along the sequence dimension to create a combined conditioned sequence that uses both music and time information
In L278-L281 of model/model.py, what is the purpose of making 2 time tokens instead of just 1 time token ?