ai4r / Gesture-Generation-from-Trimodal-Context

Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Other
245 stars 35 forks source link

How to deal with Gentle's misalignment #26

Closed teshima058 closed 2 years ago

teshima058 commented 2 years ago

Hi, I'm trying to generate gestures using the human voice and its transcript text as input. Most of the time, Gentle works without error, but occasionally it fails to align.

This is the sample. https://drive.google.com/drive/folders/1ylzKcL9ei7nHYYhlu7xeGUd_ehNays_g?usp=sharing I have checked the words_with_timestamps in this sample.

['they', 0.1, 0.17] ['were', 0.17, 0.31000000000000005] ['they', 0.55, 0.67] ['actually', 0.67, 0.98] ['worked', 1.0, 1.37] ['really', 1.37, 1.56] ['hard', 1.56, 1.76] ['and', 1.82, 1.87] ['like', 1.98, 2.2199999999999998] ['were', 2.57, 2.8] ['writing', 2.8, 3.17] ['people', 3.17, 3.46] ['on', 3.46, 3.61] ['Twitter', 3.62, 3.99] ['trying', 3.99, 4.19] ['to', 4.19, 4.25] ['go', 4.25, 4.34] ['like', 4.34, 4.5] ['how', 4.5, 4.66] ['do', 4.66, None] ['I', None, 4.8]

If the time contains None, I get the following error at location data_preprocessor.py(L183).

 File "/Gesture-Generation-from-Trimodal-Context/scripts/data_loader/data_preprocessor.py", line 183, in get_words_in_time_range if word_e <= start_time: TypeError: '<=' not supported between instances of 'NoneType' and 'float'

How should I deal with such Gentle errors?

youngwoo-yoon commented 2 years ago

Hello, Unfortunately there is no solution for this. I would circumvent this issue by adding timestamps estimated roughly (in your example, perhaps 4.7 and 4.75 because these numbers are in between 4.66 and 4.8). I suspect there are few occurrences so I think it will not hurt much.

teshima058 commented 2 years ago

Thank you for reply. I will try to fix Gentle's errors manually.