-
Hello, if it is convenient, can you provide the data set files used in the paper? Thank you very much.
-
Could you provide detailed preprocessing scripts for Hollywood2 amd Imagenet datasets? Thanks a lot
-
### Introduction
The efficacy of a transformer model is significantly influenced by the quality of its training data. However, the original training dataset utilized by https://github.com/NetEase/P…
-
Hi @s-zanella @nilslukas , thanks for the amazing work and for releasing the code!
I found that the repo provides the configs for ECHR but not for Enron or Yelp-health. I want to run some experimen…
-
Add functions to the data preprocessing (that is used to create the TFRecords files that are streamed during training) to read/process the ERA5 input data and the CERRA target data.
As both datasets …
-
Hello,
During data preprocessing in the pose_data.py file there are separate methods for annotating camera train and real train datasets. In the camera train dataset, you have done Umeyama alignment …
-
**Is your feature request related to a problem? Please describe.**
i find myself more and more relying on datasets just to do all the preprocessing. One thing however, for removing duplicated rows, I…
-
Hello!
Thank you for your work at MLLM.
I had a fine-tuning bug that I couldn't fix: when I ran the `stage2_sft.sh` script and trained with speech_conv_datasets only, the logger showed that the trai…
-
Since my training environment could not connect to the internet, I download the model and dataset and save them in the local disk.
The arguments:
**model path**: ModelArguments(base_model_revision=N…
-
I would greatly appreciate it if you could elaborate on how to process the dataset.
In the Datasets section, it says that all datasets are processed as a sliding window view, and the format is comp…