EpistasisLab / pmlb

PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms.
https://epistasislab.github.io/pmlb/
MIT License
800 stars 133 forks source link

AI Feynman datasets #184

Open aminravanbakhsh opened 1 week ago

aminravanbakhsh commented 1 week ago

I am trying to fetch a dataset form AI Feynman but I receive the following error:

from pmlb import fetch_data

name = "feynman_III_12_43" dataset = fetch_data(name)

ValueError: Dataset not found in PMLB.

gAldeia commented 1 week ago

Hi @aminravanbakhsh Which version of PMLB are you running? I managed to fetch this dataset without problems. I'm using python==3.8.19 and pmlb==1.0.2a.

Two possible solutions:

  1. Install pmlb from the source. Clone this repo and do pip install . from its root . That's how I installed it here. I'm using a conda environment specifically for building PMLB at its latest version.
  2. Download the dataset folder from this repo (https://github.com/EpistasisLab/pmlb/tree/master/datasets/feynman_III_12_43), put it into a local folder, and use fetch_data(name, local_dir='<path to the folder>'), it should work, as long as the name of the folder and the .tsv.gz file are the same. I tried creating a local copy manually and it worked:
    
    from pmlb import fetch_data

name = "feynman_III_12_43_copy" dataset = fetch_data(name, local_cache_dir=f"./datasets/") dataset```

aminravanbakhsh commented 1 week ago

Hi @gAldeia Thank you for your reply. I am using :

pmlb==1.0.1.post3 Python 3.12.4

gAldeia commented 5 days ago

@aminravanbakhsh Did you tried downloading the dataset locally and using the local_cache_dir to load it? It seems that your version 1.0.1.post3 was released in Sep 10, 2020, and the Feynman datasets were added just after July 2021 . Installing it locally by cloning the repo and performing pip install . should also solve your problem.

While this may be a workaround, ideally the PMLB should be updated at PyPI to its latest version.

Right now I am trying to submit new datasets, and there is this github action issue that is keeping me from actually doing it. If the local cache works I think we can close this issue and open a new one to update PyPI package to its latest version.

aminravanbakhsh commented 4 days ago

Hi Guilherme, Thank you for your email. I fixed problem with downloading the data on my local computer. I think we could end the issue as you want. Please let me know if anything else is needed.

Sincerely, Amin