Open gowitheflow-1998 opened 3 months ago
I believe we had this trust_remote_code issue a while ago when we wanted to turn files into parquet, and retrieval datasets weren't compatible. Just confirmed with @KennethEnevoldsen this hasn't been solved.
It is solved atm by setting trust_remote_code=True
, where required, but future dataset should not use this (tests will fail). It would be great if someone would fix older datasets as well, but it is not strictly required.
I believe we had this
trust_remote_code
issue a while ago when we wanted to turn files into parquet, and retrieval datasets weren't compatible. Just confirmed with @KennethEnevoldsen this hasn't been solved.Happened to find a solution here, where they turn corpus, queries and qrels separately into parquets. Can then
load_dataset(dataset_name, "qrels")
,load_dataset(dataset_name, "query")
,load_dataset(dataset_name, "corpus")
.I had a go implementing i2t retrieval using this format here. Works smoothly. Will follow this solution when creating more image-text retrieval ones and maybe for main branch we can deal with it the same way!