Load the test dataset locally

Yanllan commented 5 months ago

Hello, thanks for your excellent work! I'd love to try your work.

However, I can't access kaggle or Huggingface from my server because of network problems. I can only download the dataset locally first. How do I load the dataset locally instead of downloading it from the network? Do I need to process this dataset? For example, extract the answers in a certain format. Question equivalence?

If possible, can you give a process and method that can be referenced? Thank you very much for your help!

corentin-ryr commented 5 months ago

Hello,

If I understand correctly, you can manually copy files to your server but you can't download from the internet. MultiMedEval will skip downloading from the internet if the datasets are already present on the specified path. Here are suggestions to get the files locally for Kaggle and for Huggingface.

Manual download for Kaggle:

The Kaggle API that I use simply downloads and unzips the dataset, you can do the same manually by going to the page of the dataset. Once you have the zip file you can add it to your server and unzip it. You can then specify the correct path in the SetupParams.

Manual download for Huggingface

For Huggingface, it is not as straightforward because the API does some processing on the data. I would suggest loading the datasets by running MultiMedEval locally (with the following code for example) and then uploading the folder where the datasets are stored (in this case data) to your server. You can then specify the appropriate path on your server and HF will skip the network call.

from multimedeval import MultiMedEval, SetupParams
engine = MultiMedEval()

setupParams = SetupParams(MedQA_dir="data/")
tasksReady = engine.setup(setupParams=setupParams)

Note that this solution would also work for the Kaggle dataset.

I hope this helps.

Yanllan commented 5 months ago

Thank you very much for your timely reply!

But I also have a question. Do you mean to initialize the local path here ? which I screenshot from the .SetupParams.

corentin-ryr commented 5 months ago

Yes, in SetupParams, you can specify a path to download each dataset. We chose to implement it this way because some of the datasets are very large and needed to be downloaded in a different location on our own system.

However, you can also specify the same path for all the datasets you want to use (e.g. data) and it will work fine. If you don't specify some of the paths, those datasets will be skipped during the setup.

Yanllan commented 5 months ago

Roger that! Thank you very much for your help!

Besides, I am working on multimodal medical treatment, and I hope to cooperate with you if I have the opportunity. A reliable, effective and universal evaluation framework is what I need most now!

Finally, congratulations on such an amazing job!

corentin-ryr commented 5 months ago

Thank you very much, I'm glad you appreciate it.

Don't hesitate to contact me via email :)

corentin-ryr / MultiMedEval

Load the test dataset locally #10

Manual download for Kaggle:

Manual download for Huggingface