DiskResource has been completely changed to now support downloading from a HuggingFace datasets repository. (Just to keep things simple I completely removed the Google Cloud logic, but if you think it should stay then we can maybe just merge the two together.)
As it stands, it's been hardcoded to download from this repo but it can be changed to something else by overriding DB_HF_DATA (see disk_resource.py). It would be good if you can test this branch out with a sanitised design_bench_data folder to make sure that everything downloads correctly.
Sadly, most datasets are missing their pretrained oracle weights :(. This means that most tasks just take forever to import since it will try train an oracle instead. These are the only pretrained weights I have on hand:
If you are able to fill in some of these gaps that would be good.
Other changes:
The warning Setting 'max_len_sentences_pair' is now deprecated. This value is automatically set up. Setting 'max_len_single_sentence' is now deprecated. This value is automatically set up. has now been suppressed since it spams the screen when you import.
Hi,
DiskResource
has been completely changed to now support downloading from a HuggingFace datasets repository. (Just to keep things simple I completely removed the Google Cloud logic, but if you think it should stay then we can maybe just merge the two together.)As it stands, it's been hardcoded to download from this repo but it can be changed to something else by overriding
DB_HF_DATA
(seedisk_resource.py
). It would be good if you can test this branch out with a sanitiseddesign_bench_data
folder to make sure that everything downloads correctly.Sadly, most datasets are missing their pretrained oracle weights :(. This means that most tasks just take forever to import since it will try train an oracle instead. These are the only pretrained weights I have on hand:
If you are able to fill in some of these gaps that would be good.
Other changes:
Setting 'max_len_sentences_pair' is now deprecated. This value is automatically set up. Setting 'max_len_single_sentence' is now deprecated. This value is automatically set up.
has now been suppressed since it spams the screen when you import.np.loads
.Thanks.