I think it is wrong code, please confirm this

seohyeonShin commented 6 months ago

first, I did run dataset.py for preparing dataset. and than,, um, I found that line number 377 in dataset.py is wrong. because, in your code... there is not utils/utils. so I think that from utils.tools import to_device. is this correct?

also, I can't find this location
open("./config/LJSpeech/preprocess.yaml", "r") where is it? and,, what is it?

GalaxyCong commented 6 months ago

Hello, Dear author

You can import the Dataset module using “from dataset import Dataset” in train.py, it’s correct cause you don‘t need to run the dataset.py file anymore.

Thus, there is no required step of running the data.py file in the preparation alone. FS2

you can ignore the open ("./config/LJSpeech/preprocess.yaml", "r") cause we did not use this dataset.

("./config/LJSpeech/preprocess.yaml", "r") Source: https://github.com/ming024/FastSpeech2/blob/master/dataset.py

seohyeonShin commented 5 months ago

Thank you for your kind response. Unfortunately, due to my lack of understanding, my preparation for driving your code is very poor. If you don't mind, can I ask you an additional question? I tried to run the mentioned lip2wav once. also,, My place is where Baidu is not accessible, so I accessed it with your Google Drive, but I can't access it because I can't set permissions. But first, I downloaded chem data. Below are the questions.

Is there a separate database for corporus in preprocess.yaml? -1How do I get the corpus, do I just run the mfa?

2.Is raw_path where the original mp4 data(chem youtube video) should be? -->preprocessed_path : "data/conggaoxiang/V2C/V2C_Code/example_Chem16_framelevel/chem" -->Do I paste the folders created by sequentially performing 1_get_your_frames.py~ in the HPDubbing-how-to-get-face-and-lip into the preprocessed_path folder?

Overall, it is difficult to have the required folder structure. I'm really sorry to ask you this basic thing because it's my first time researching the field of speech, but... I think it will be of great help if you answer.

GalaxyCong commented 4 months ago

Hello, sorry for the delayed response. I'm glad to answer your questions:

Q1 Audio part: The first step is to download the data set. The second step is to execute prepare_align.py The third step is to use mfa to get the *.TextGrid file, or directly download the one we processed The fourth step is to run preprocess.py, and then the preprocessed audio part is saved in the preprocessed_path path you set.

Q2.1 raw_path is the result of the original data processed by prepare_align.py, which contains *lab (raw text) and .wav (normalized audio).

Q2.2

No, you do not need to paste. Because preprocessed_path is only related to audio processing, "HPDubbing-how-to-get-face-and-lip" is related to video preprocessing. In "HPDubbing-how-to-get-face-and-lip", we provide some examples and codes of how to extract lip areas and facial areas.

The processing flow needs some time, so we directly provide features and disclose the extracted mouth and facial areas (.jpg) of the two datasets chem and V2C. Thanks for your reply again, I will reply to you as soon as possible if I have time.

GalaxyCong / HPMDubbing

I think it is wrong code, please confirm this #8