In load_data.py, the first line of the main function is:
ds_features = get_ds_infos(
dataset_name,
trust_remote_code=True)[dataset_config].features
assert text_feature in ds_features, \
f"'{text_feature}' not in '{dataset_name}' features {ds_features}!"
get_ds_infos is a HF function that pulls dataset info from the HF hub. It will not work if running offline on the supercomputer. I cannot find any documentation on this function, so these lines are commented out for now. Additionally, I don't really see any purpose to these lines; you should be fairly familiar with your data and it's features before running a script like this. Of course, it doesn't hurt.
In
load_data.py
, the first line of the main function is:get_ds_infos
is a HF function that pulls dataset info from the HF hub. It will not work if running offline on the supercomputer. I cannot find any documentation on this function, so these lines are commented out for now. Additionally, I don't really see any purpose to these lines; you should be fairly familiar with your data and it's features before running a script like this. Of course, it doesn't hurt.