HazyResearch / evaporate

This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes"
475 stars 45 forks source link

Missing data set /fda-ai-pmas/510k/ or data gen is missing something? #25

Closed brando90 closed 8 months ago

brando90 commented 1 year ago

I downloaded the data as instructed but it's missing the /fda-ai-pmas/510k/?

(maf) brando9@ampere1:~/data/evaporate$ python ~/evaporate/evaporate/run_profiler.py \
>     --data_lake fda_510ks \
>     --do_end_to_end False \
>     --num_attr_to_cascade 50 \
>     --num_top_k_scripts 10 \
>     --train_size 10 \
>     --combiner_mode ws \
>     --use_dynamic_backoff True \
>     --KEYS ${keys}

Data lake
Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/evaporate/evaporate/run_profiler.py", line 476, in <module>
    main()
  File "/lfs/ampere1/0/brando9/evaporate/evaporate/run_profiler.py", line 472, in main
    run_experiment(profiler_args)
  File "/lfs/ampere1/0/brando9/evaporate/evaporate/run_profiler.py", line 235, in run_experiment
    _, _, _, _, args = get_structure(data_lake)
  File "/afs/cs.stanford.edu/u/brando9/evaporate/evaporate/utils.py", line 105, in get_structure
    files = get_all_files(args.data_dir)
  File "/afs/cs.stanford.edu/u/brando9/evaporate/evaporate/utils.py", line 49, in get_all_files
    for file in os.listdir(data_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/lfs/ampere1/0/brando9/data/evaporate/data/fda-ai-pmas/510k/'

but nowhere to be fund:

(maf) brando9@ampere1:~/data/evaporate$ tree .
.
├── assets
│   └── banner.png
├── data
│   ├── enron
│   │   └── table.json
│   ├── fda_510ks
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── generative_indexes
│   │   └── fda_510ks
│   ├── swde_movie_allmovie
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_amctv
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_hollywood
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_iheartmovies
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_imdb
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_metacritic
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_rottentomatoes
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_movie_yahoo
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_university_collegeprowler
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_university_ecampustours
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_university_embark
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_university_matchcollege
│   │   ├── docs.tar.gz
│   │   └── table.json
│   ├── swde_university_usnews
│   │   ├── docs.tar.gz
│   │   └── table.json
│   └── wiki_nba_players
│       ├── docs.tar.gz
│       └── table.json
└── README.md

20 directories, 33 files

is there a missing instruction or file or something else?

hasalams commented 11 months ago

I face the similar error. How to resolve this please?

FileNotFoundError: [Errno 2] No such file or directory: 'evaporate/data/fda-ai-pmas/510k/'

simran-arora commented 10 months ago

Please adjust the defaults in this config file depending on your directory names: https://github.com/HazyResearch/evaporate/blob/83204a54dd97fb0f51a01643b4fc16c97fc5e472/evaporate/configs.py#L60

xiaohaoxing commented 9 months ago

The huggingface dataset contains a directory named fda_510ks, so I replace the data_dir config into os.path.join(BASE_DATA_DIR, "fda_510ks") and it works.

simran-arora commented 8 months ago

Great!