Svito-zar / gesticulator

The official implementation for ICMI 2020 Best Paper Award "Gesticulator: A framework for semantically-aware speech-driven gesture generation"
https://svito-zar.github.io/gesticulator/
GNU General Public License v3.0
122 stars 19 forks source link

the consult of verifying when training #40

Closed birdflies closed 3 years ago

birdflies commented 3 years ago

Hi,I am extremely grateful to all your help always. I have questions as follows: During training , do the model use 'X_dev.npy' and 'Y_dev.npy' for verifying?

    def __init__(self, root_dir, apply_PCA=False, train=True):
        """
        Args:
            root_dir (string): Directory with the datasat.
        """
        self.root_dir = root_dir
        # Get the data
        if train:
            self.audio = np.load(path.join(root_dir, 'X_train.npy')).astype(np.float32)
            self.text = np.load(path.join(root_dir, 'T_train.npy')).astype(np.float32)
            # apply PCA
            if apply_PCA:
                self.gesture = np.load(path.join(root_dir, 'PCA', 'Y_train.npy')).astype(np.float32)
            else:
                self.gesture = np.load(path.join(root_dir, 'Y_train.npy')).astype(np.float32)
        else:
            self.audio = np.load(path.join(root_dir, 'X_dev.npy')).astype(np.float32)
            self.text = np.load(path.join(root_dir, 'T_dev.npy')).astype(np.float32)
            # apply PCA
            if apply_PCA:
                self.gesture = np.load(path.join(root_dir, 'PCA', 'Y_dev.npy')).astype(np.float32)
            else:
                self.gesture = np.load(path.join(root_dir, 'Y_dev.npy')).astype(np.float32)

Or the model use the 'X_dev_NaturalTalking_001.npy' when verifying?

class ValidationDataset(Dataset):
    """Validation samples from the Trinity Speech-Gesture Dataset."""

    def __init__(self, root_dir, past_context, future_context):
        """
        Args:
            root_dir (string): Directory with the datasat.
        """
        self.root_dir = root_dir
        self.past_context = past_context
        self.future_context = future_context
        # Get the data
        self.audio = np.load(path.join(root_dir, 'dev_inputs', 'X_dev_NaturalTalking_001.npy')).astype(np.float32)
        self.text = np.load(path.join(root_dir, 'dev_inputs', 'T_dev_NaturalTalking_001.npy')).astype(np.float32)

I'm a little confused~

Svito-zar commented 3 years ago

Right, so there are two different types of validation going on. X_val and Y_val are used to track validation loss with respect to the training loss (to check for overfitting) While X_dev_NaturalTalking_001.npy is used to check the quality of the generated motion. This sequence is much longer than the sequences in the training and validation set.

birdflies commented 3 years ago

@Svito-zar Thanks for your answer. i got it.

birdflies commented 3 years ago

Right, so there are two different types of validation going on. X_val and Y_val are used to track validation loss with respect to the training loss (to check for overfitting) While X_dev_NaturalTalking_001.npy is used to check the quality of the generated motion. This sequence is much longer than the sequences in the training and validation set.

Hi, do X_dev_NaturalTalking_001.npy affects the training results. I add new dataset in my experiment, but the training results seems no changes. Moreover, the results converge to few kinds of postures. Would you please help me analysis the reasons. thanks very much~ My training loss is as follows: image and validation loss: image

Svito-zar commented 3 years ago

The X_dev_NaturalTalking_001.npy sequence should not affect the results much, it is only used to analyze the quality of the model. If you use another dataset - you should also use another validation sequence of course.

One reason why the model might converge "to few kinds of posture" is overfitting to a small dataset. To check that you should look at how validation loss compares with the training loss.

birdflies commented 3 years ago

Ok, thanks, maybe the reason is that I merge trinity dataset with my own dataset.

Svito-zar commented 3 years ago

Yes, merging different datasets might be the reason. Since they probably have different voice and motion styles.

ghenter commented 3 years ago

Yes, merging different datasets might be the reason. Since they probably have different voice and motion styles.

Not to mention that they can have different skeletons (which would require retargeting using, e.g., Autodesk MotionBuilder) and different parametrisations.

birdflies commented 3 years ago

@Svito-zar @ghenter Thanks. For the new datasets. I extracted the Joint as follows and merging different datasets. Maybe the mean pose of two dataset differ greatly?

def extract_joint_angles(bvh_dir, files, dest_dir, pipeline_dir, fps):
    p = BVHParser()

    if not os.path.exists(pipeline_dir):
        raise Exception("Pipeline dir for the motion processing ", pipeline_dir, " does not exist! Change -pipe flag value.")

    data_all = list()
    for f in files:
        ff = os.path.join(bvh_dir, f + '.bvh')
        print(ff)
        data_all.append(p.parse(ff))

    data_pipe = Pipeline([
       ('dwnsampl', DownSampler(tgt_fps=fps,  keep_all=False)),
       ('root', RootTransformer('hip_centric')),
       ('mir', Mirror(axis='X', append=True)),
       ('jtsel', JointSelector(['Spine','Spine1','Spine2','Spine3','Neck','Neck1','Head','RightShoulder', 'RightArm', 'RightForeArm', 'RightHand', 'LeftShoulder', 'LeftArm', 'LeftForeArm', 'LeftHand'], include_root=True)),
       ('exp', MocapParameterizer('expmap')), 
       ('cnst', ConstantsRemover()),
       ('np', Numpyfier())
    ])

    out_data = data_pipe.fit_transform(data_all)

    # the datapipe will append the mirrored files to the end
    assert len(out_data) == 2*len(files)

    jl.dump(data_pipe, os.path.join(pipeline_dir + 'data_pipe.sav'))

    fi=0
    for f in files:
        ff = os.path.join(dest_dir, f)
        print(ff)
        np.savez(ff + ".npz", clips=out_data[fi])
        np.savez(ff + "_mirrored.npz", clips=out_data[len(files)+fi])
        fi=fi+1
Svito-zar commented 3 years ago

Maybe the mean pose of two dataset differ greatly?

That's definitely possible