Svito-zar / speech2properties2gestures

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures.
https://svito-zar.github.io/speech2properties2gestures/
1 stars 1 forks source link

Possible bugs in text processing #6

Closed nagyrajmund closed 3 years ago

nagyrajmund commented 3 years ago

Hi, I think that the words get misaligned in the code snippet below:

https://github.com/Svito-zar/probabilistic-gesticulator/blob/1a5ac949720b42e9e8efd50b4df1b91110decfdf/my_code/data_processing/annotations/create_dataset.py#L274-L283

bisect(word_starts, time_st) returns the index where time_st should be inserted into word_starts so that it remains sorted. For example, if word_starts = [0, 1, 2] and time_st = 0.2 , then bisect(...) will return the index 1.

Therefore curr_word_id is always the index of the first word that starts after time_st.

nagyrajmund commented 3 years ago

Another thing we have to consider here is the possibility of silence between words. The current implementation reuses the last word until a new one starts, but instead we should use the "silence" word embeddings between two words.

Svito-zar commented 3 years ago

Let me investigate and fix that today.

As for the silence, we already have "silence" word embeddigns for them in the code just above:

curr_file_X_data = np.zeros((total_number_of_frames, 7, 769))
Svito-zar commented 3 years ago

Fixed by the commit https://github.com/Svito-zar/probabilistic-gesticulator/commit/fe60e585a1afb178dceddb0d5948a5ca80a52e37