Closed zf223669 closed 2 years ago
Second, I intend to train the model, however, it showed that : FileNotFoundError: [Errno 2] No such file or directory: 'outputs/trimodal_gen.pth.tar
Is your data loaded correctly into the code or are there any warnings there? Also, does your "outputs" folder exist? Otherwise, the network may not be able to create the "trimodal_gen.pth.tar" folder.
Following up on the "trimodal_gen.pth.tar" not found error, we do have the trained weights available. However, we do not own the code for generating these weights and the results should be verified with the original source.
Greetings, I still have this same problem of self.num_total_samples=0, as well as train, eval and test samples are all equal to zero. The on;y waring I get when trying to run the model so far is "Warning : load_model
does not return WordVectorModel or SupervisedModel any more, but a FastText
object which is very similar." Which as far as I know, is a deprecation warning and shouldn't cause any problems.
I have the ted_db data and the fasttext 'crawl_300d_2M_subword.bin' in the right location. Although the fasttext bin file and the 'NRC_VAD_Lexicon' I donwloaded them from sources no provided in this repo.
So any help with this issue?
Are the folders lmdb_train_s2ag_v2_cache_mfcc_14
, lmdb_val_s2ag_v2_cache_mfcc_14
, and lmdb_test_s2ag_v2_cache_mfcc_14
generated? Each of these folders should contain two files, data.mdb
and lock.mdb
. The size of each lock.mdb
is 8 KB and the data.mdb
files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.
Hello: i did followed your suggest,however, it got an other error:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/s2ag/s2ag/main_v2.py -c /home/zf223669/Mount/s2ag/s2ag/config/multimodal_context_v2.yml
../data/NRC-VAD-Lexicon-Aug2018Release/NRC-VAD-Lexicon.txt
../data
Reading data '../data/ted_db/lmdb_train'...
Found the cache ../data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14
Reading data '../data/ted_db/lmdb_val'...
Found the cache ../data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14
Reading data '../data/ted_db/lmdb_test'...
Found the cache ../data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14
building a language model...
loaded from ../data/ted_db/vocab_models_s2ag/vocab_cache.pkl
++++++++++++++0 +++0 +++0
Training s2ag with batch size: 512
Loading train cache took 1 seconds.
Loading eval cache took 1 seconds.
Traceback (most recent call last):
File "/home/zf223669/Mount/s2ag/s2ag/main_v2.py", line 132, in
Are the folders
lmdb_train_s2ag_v2_cache_mfcc_14
,lmdb_val_s2ag_v2_cache_mfcc_14
, andlmdb_test_s2ag_v2_cache_mfcc_14
generated? Each of these folders should contain two files,data.mdb
andlock.mdb
. The size of eachlock.mdb
is 8 KB and thedata.mdb
files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.
Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem?
I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for --train
in the arguments of main_v2.py
. There is a similar one, which is --train-s2ag
, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?
Are the folders
lmdb_train_s2ag_v2_cache_mfcc_14
,lmdb_val_s2ag_v2_cache_mfcc_14
, andlmdb_test_s2ag_v2_cache_mfcc_14
generated? Each of these folders should contain two files,data.mdb
andlock.mdb
. The size of eachlock.mdb
is 8 KB and thedata.mdb
files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem?
I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for
--train
in the arguments ofmain_v2.py
. There is a similar one, which is--train-s2ag
, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?
I have debugged some cmd line argument issues to make sure the code does not require the full training data to be loaded if you just want to test the network. However, you still need to have the lmdb_train
, lmdb_val
and lmdb_test
folders with the data.mdb
and lock.mdb
files as they contain relevant metadata. You do not need the additional cache or npz folders.
Hello: i did followed your suggest,however, it got an other error: /home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/s2ag/s2ag/main_v2.py -c /home/zf223669/Mount/s2ag/s2ag/config/multimodal_context_v2.yml ../data/NRC-VAD-Lexicon-Aug2018Release/NRC-VAD-Lexicon.txt ../data Reading data '../data/ted_db/lmdb_train'... Found the cache ../data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14 Reading data '../data/ted_db/lmdb_val'... Found the cache ../data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14 Reading data '../data/ted_db/lmdb_test'... Found the cache ../data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14 building a language model... loaded from ../data/ted_db/vocab_models_s2ag/vocab_cache.pkl ++++++++++++++0 +++0 +++0 Training s2ag with batch size: 512 Loading train cache took 1 seconds. Loading eval cache took 1 seconds. Traceback (most recent call last): File "/home/zf223669/Mount/s2ag/s2ag/main_v2.py", line 132, in pr.train() File "/home/zf223669/Mount/s2ag/s2ag/processor_v2.py", line 979, in train self.trimodal_generator.load_state_dict(trimodal_checkpoint['trimodal_gen_dict']) File "/home/zf223669/Mount/anaconda3/envs/s2ag/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PoseGeneratorTriModal: size mismatch for text_encoder.embedding.weight: copying a param with shape torch.Size([29460, 300]) from checkpoint, the shape in current model is torch.Size([26619, 300]).
I am actually unable to replicate this error as my trimodal_generator
matches all keys successfully. I have re-uploaded the file trimodal_gen.pth.tar
from a different local address. Maybe try downloading it again?
Are the folders
lmdb_train_s2ag_v2_cache_mfcc_14
,lmdb_val_s2ag_v2_cache_mfcc_14
, andlmdb_test_s2ag_v2_cache_mfcc_14
generated? Each of these folders should contain two files,data.mdb
andlock.mdb
. The size of eachlock.mdb
is 8 KB and thedata.mdb
files have sizes of a few GBs (the one for 'train' is a few hundred GBs). If these files are present but the sizes do not match, please delete them and rerun the code to re-generate them.Those files are generated at the "ted_db" folder, but each has a size of around 16kb (the whole folder with including data and lock.mdb). Granted, I only have 50GB of free space in the hard drive, although at the first run it was complaining as I had only 40GB so I increased that to 50GB and the code ran properly after that till I ran into this problem. So what is the needed freespace for the model to run properly without problem? I was hoping to use the pre-trained model as I didn't need to retrain the model and would like to use the inference directly for output. However, there is no command line for
--train
in the arguments ofmain_v2.py
. There is a similar one, which is--train-s2ag
, but setting that to False, still requires the data to be present, and I end up with the same error as the one described before. So is it possible to use the model without retraining it, using the provided pre-trained model?I have debugged some cmd line argument issues to make sure the code does not require the full training data to be loaded if you just want to test the network. However, you still need to have the
lmdb_train
,lmdb_val
andlmdb_test
folders with thedata.mdb
andlock.mdb
files as they contain relevant metadata. You do not need the additional cache or npz folders.
Okay I will try the model again tomorrow, and see if I can get it to run the inference without having to retrain it. Will keep you updated with my results, and if there are still any issues.
May I suggest that you describe the installation, configuration and running process in more detail, such as which directory is the downloaded dataset placed, how to configure the running parameters, etc.? Thank you!
The running parameters are straightforward and available easily from the command-line argument descriptors as well as the main paper. The readme already contains details on where each downloaded should be placed, I will add in more details based on the issues that are being raised and resolved. Meanwhile, let me know if your current issue is resolved.
hi, I have downloaded the "The Trinity Gesture dataset" , however, it contained many files, which one should I load?
Hello, I recloned all the project again, modified the basepath. created data folder, and put the
fasttext,
GENEA_Challenge_2000_data_release,
NRC_VAD_Lexicon_Aug2018Release,
ted_db datasets in it and get ready to run the main_v2.py
I typed the instruct: python3 main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml and run it.
However , it showed an error about the mfcc, showed below:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/speech2affective_gestures/main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train'...
Creating the dataset cache...
Traceback (most recent call last):
File "/home/zf223669/Mount/speech2affective_gestures/main_v2.py", line 121, in
What`s that problem? :(
I have try many times, however, it showed some weird problems. May I recommend that you could clone your project in another place and try to fix the bug?
I now get stuck at the lang_model = pickle.load(f) show below:
/home/zf223669/Mount/anaconda3/envs/s2ag/bin/python3.7 /home/zf223669/Mount/speech2affective_gestures/main_v2.py -c /home/zf223669/Mount/speech2affective_gestures/config/multimodal_context_v2.yml
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_train_s2ag_v2_cache_mfcc_14
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_val'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_val_s2ag_v2_cache_mfcc_14
Reading data '/home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_test'...
Found the cache /home/zf223669/Mount/speech2affective_gestures/data/ted_db/lmdb_test_s2ag_v2_cache_mfcc_14
building a language model...
loaded from /home/zf223669/Mount/speech2affective_gestures/data/ted_db/vocab_models_s2ag/vocab_cache.pkl
Traceback (most recent call last):
File "/home/zf223669/Mount/speech2affective_gestures/main_v2.py", line 121, in
Let me look into these errors for you. These errors seem a bit unusual and I don't recall coming across them myself. Might be a case of version mismatches or some missing files, but I will try to replicate your errors and update the repo and readme accordingly. It might take some time though.
Thank you!! :)
Hi, it turns out that some of the packages were deprecated without compatibility since we released our code. As a result, preprocessing the data and later running training/inference needs you to have different versions of basic packages such as numpy installed. This requires strict versioning and modulation of our codebase that are beyond our scope at the moment. To circumnavigate the issue, I am uploading the preprocessed dataset in a single folder that you can download and directly use for training/inference. Please keep the entire contents of the download in the folder data/ted_db
and point the variable data_path
in main_v2.py
to the data
folder. I have also revised the code to make the data loading faster. Let me know if you still face issues.
Also, I am closing this issue at it seems more general than the original title suggests. Please pose any follow-up in issue #7.
Hi, In processor_v2.py, I found that the self.num_total_samples = 0, I try to debug and show that the variation n_samples in self.data_loader['train_data_s2ag']/[eval_data_s2ag]/[test_data_s2ag] is zero. what`s that problem? Thanks!