DRAGNLabs / 301r_retnet

2 stars 1 forks source link

get_ds_infos() cannot work offline when loading data #20

Closed JayOrten closed 5 months ago

JayOrten commented 5 months ago

In load_data.py, the first line of the main function is:

ds_features = get_ds_infos(
        dataset_name,
        trust_remote_code=True)[dataset_config].features
assert text_feature in ds_features, \
        f"'{text_feature}' not in '{dataset_name}' features {ds_features}!"

get_ds_infos is a HF function that pulls dataset info from the HF hub. It will not work if running offline on the supercomputer. I cannot find any documentation on this function, so these lines are commented out for now. Additionally, I don't really see any purpose to these lines; you should be fairly familiar with your data and it's features before running a script like this. Of course, it doesn't hurt.

nprisbrey commented 5 months ago

Considering this issue resolved, seeing that this code was deleted in PR #21.