Open zhuyingSeu opened 2 years ago
+1
+1
I think just use some upsample or downsample or any method that can transform every clip unit to 2s. easy to process with .wav but hard with .fbx.
MY PROPOSAL IS IF music clips and dance clips are both 2s, Use a window of 8s (include 4 clips, stride 2s) to capture the inputs. And the corresponding 2s clip c[1] should be the second clip of the 8s window [c[0], c[1], c[2], c[3]]. So the inputs capture not only the features of the current state c[1], but also the features of the previous c[0] and subsequent ones c[2] c[3] At the begining and ending, Just padding 0 and believe the MODEL can understand it😇. only have 3 situations.
Thanks for your sharing.I have some questions.