Closed hytting closed 9 months ago
Hi I have found out the problem. When pip install multimodal-transformers, somehow the 0.11a0 version was installed instead of the latest one. In 0.11a0, there is a bug in the load_data.py file and it's updated in the newest version: train_df=data_df.iloc[:len_train]. (The old version use df.loc[train_df.index])
So I manually changed the py file and it works now.
Hi there,
When I tried to load the train/val/test set csv file that I splitted with load_data_from_folder in multimodal_transformers.data, the returned train_dataset/val_dataset/test_dataset will give me a strange length, which is totally not related to the original length of the csv file. for example, the train_df.shape = (105195,25), while the train_dataset.cat_feats.shape = (131495,38).
For spliting dataset, I tried train_test_split and np.split, but they both gave me the same issue with loading.
But if I followed the exact same code in the notebook for splitting datasets, load_data_from_folder would work well. At the same time, if I modify one column, such as match the number with words from [0,2,0...] to [A,B,A...], it also cannot load in the correct way.
Does anyone have any suggestions?