brandontrabucco / design-bench

Benchmarks for Model-Based Optimization
MIT License
80 stars 19 forks source link

error while importing design-bench #11

Open erfanhamdi opened 1 year ago

erfanhamdi commented 1 year ago

Hello

I installed the design-bench package the way you mentioned that didn't require MuJoCo and every thing goes fine until when I am trying to import design-bench I get this error.

Exception has occurred: ValueError
Can't find a vocabulary file at path '/Users/venus/miniconda3/envs/mbo/lib/python3.11/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`
  File "/Users/venus/miniconda3/envs/design_bench_conda/lib/python3.11/site-packages/design_bench_data/test.py", line 2, in <module>
    import design_bench
ValueError: Can't find a vocabulary file at path '/Users/venus/miniconda3/envs/mbo/lib/python3.11/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)

Is there anything that I am missing? like downloading other files?

christopher-beckham commented 8 months ago

What version of transformers are you using? I'm assuming maybe it's a version mismatch since the error mentions BertTokenizer.

When I set up design bench I try (as much as I can) to use the dependencies listed in design-baselines:

https://github.com/brandontrabucco/design-baselines/blob/master/requirements.txt

harish2sista commented 8 months ago

Hi! I'm facing the same issue. I have strictly installed all the package versions from requirements.txt. But the issue still persists. Does anyone have any other recommendations?

christopher-beckham commented 8 months ago

@harish2sista can you list the error that you get? Is it the same as the original poster's error message?

In my own experience setting this up there are some very specific version requirements for some of the packages, and it's not guaranteed to work out of the box when you install from requirements.txt. There were certain packages that I had to install with pip install <module>==version --no-deps so that it didn't pull in newer version numbers for other dependencies (i.e. emphasis on --no-deps).

You can check my own working requirements.txt here: https://gist.github.com/christopher-beckham/d1b319e84ae359e671b67e5fa7c663da

For example if you suspect the issue is with the transformers module you can try the version from my requirements.txt file:

pip install transformers==3.5.1 --no-deps

However you may also have to do the same thing for any extra dependencies of transformers.

harish2sista commented 8 months ago

@christopher-beckham Thanks for the swift response. The error message I received is

File "/home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/transformers/tokenization_bert.py", line 196, in __init__ "model use 'tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)'".format(vocab_file) ValueError: Can't find a vocabulary file at path '/home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use 'tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)' //replaced my

And thank you for sending your requirements.txt file. Are you using python >= 3.8? because the conda env create -f design-baselines/environment.yml code installs python3.7. I received Python version incompatibility issue with your file.

harish2sista commented 8 months ago

Also, after investigation I found that the /home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data directory is empty. I don't know if this has anything to do with the root library.

christopher-beckham commented 8 months ago

I am using Python 3.8 yes. If you could try with that it would be good.

I never tried Python 3.7 but my first attempt at the environment was with 3.9 and I didn't get it to work. For 3.9 there were certain modules (wheels) which were compiled for 3.8 but not 3.9, and so I had to downgrade because of pip complaining it couldn't find anything. Again, this seems to be a consequence of the fact that Design Bench requires some older library versions.

I am currently working on a fork which fixes some of these issues and is much more forgiving with dependencies. If you think it might help you, I could try work on it a bit more this weekend and make it public. Thanks.

harish2sista commented 8 months ago

@christopher-beckham Thank you very much!! That would be really helpful! In the meantime, I will try the installation with Python 3.8. I'll let you know if I come across any new issues. I'm very excited about your repo! I'm looking forward to it.

harish2sista commented 8 months ago

Hi, which Mujoco-py version are you using?

christopher-beckham commented 8 months ago

I'm using the latest version I got from the Github. Here are some instructions I wrote for myself, your mileage may vary:

design-baselines appears to be using version mujoco-py==2.0.2.3, but I had trouble trying to install this. I could only install a slightly newer version by following one of the replies in https://github.com/openai/mujoco-py/issues/773 which suggests to do the following:

This worked for me so long as I exported my LD_LIBRARY_PATH to be the following (note that the compilation log will tell you if something is missing in this path which you need to add):

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:<path to mujoco>/mujoco210/bin:/usr/lib/nvidia

The last issue involved something to do with some missing libs which need to be installed. While this can be easily done with root privileges and apt-get, you can also install them with conda by following https://github.com/openai/mujoco-py/issues/627#issuecomment-1007658905

conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3
export CPATH=$CONDA_PREFIX/include
pip install patchelf
seminumber commented 8 months ago

I'm facing the same issue. If I'm correct, it seems that the file on the web 'https://storage.googleapis.com/design-bench/smiles_vocab.txt' is missing. It is defined in morgan_fingerprint_features.py:67 and onward:

vocab_file = DiskResource(
    os.path.join(DATA_DIR, 'smiles_vocab.txt'),
    download_method="direct",
    download_target=f'{SERVER_URL}/smiles_vocab.txt')

This code seems to cache a file on the web, the link to the file seems broken now.

harish2sista commented 8 months ago

Hi @seminumber, Thanks for bringing this up. I did more deep research following this solution. It seems the entire design-bench server is not working!

@brandontrabucco, could you please help us resolve this issue?

christopher-beckham commented 8 months ago

Hmm this is concerning. I have smiles_vocab locally, maybe I can upload it somewhere for now for you. I could maybe modify the code on my fork so that it will just pull the file from my Google Drive.

(For context, I am working on my own fork which will not import everything at once. Dataset specific imports should only happen when those datasets get created. Right now in order to import design_bench you need to have satisfied every dataset's dependencies.)

harish2sista commented 8 months ago

@christopher-beckham, Thank you for the help! Could you please post the link to the smiles_vocab file after you upload it?

Looking forward to your fork; thank you!

christopher-beckham commented 7 months ago

You can get the data here: https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link

I don't know if it has all the data, because I only imported certain things like the RL environments.

Extract the contents to <path to design bench repo>/design-bench/design_bench_data.

I could look at my fork and maybe specify an alternative which downloads it from my gdrive.

christopher-beckham commented 7 months ago

Brandon is currently on vacation but I did recently inform him about this so hopefully it will be an easy fix from his end. Unfortunately I am suffering from this issue myself since my design_bench_data directory doesn't contain everything, only the stuff for Ant/Kitty/Superconductor.

harish2sista commented 7 months ago

Hi @christopher-beckham, I almost got the package to work with Mujoco_py==2.1.2.14. The only dependencies I'm missing are tensorflow, torch_geometric, pytorch_lightning, and Jax. Could you please tell me which versions of these packages I should install?

harish2sista commented 7 months ago

You can get the data here: https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link

I don't know if it has all the data, because I only imported certain things like the RL environments.

Extract the contents to <path to design bench repo>/design-bench/design_bench_data.

I could look at my fork and maybe specify an alternative which downloads it from my gdrive.

Thank you for sharing the design_bench_data directory, adding this to the ~/miniconda/envs/<env_name>/lib/python<version>/site-packages/ resolved the missing vocab file error.

christopher-beckham commented 7 months ago

I don't know about Tensorflow GPU, personally I use PyTorch and so when import tensorflow complains about not finding a GPU (or missing a lib) I just ignore it. I also had a Jax import warning and I just ignored it. It's not clear to me whether it's some extraneous dependency that Tensorflow needs (but which has nothing to do with Design Bench).

At this point just make sure that you can import design_bench without it failing. If other issues persist we can talk about specifically. Thanks.

harish2sista commented 7 months ago

Actually, that is causing an issue. I am unable to run the [Reproducing Baseline Performance](https://github.com/brandontrabucco/design-bench?tab=readme-ov-file#reproducing-baseline-performance).

I am getting this repetitive error

2024-01-16 15:09:35.904815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/home/<user-name>/.mujoco/mujoco210/bin:/usr/lib/nvidia/lib64:/usr/lib/nvidia

christopher-beckham commented 7 months ago

There is not much I can do to help from memory (I don't use TF), but you can try to find it (or download it) and insert it into your LD_LIBRARY_PATH and see if that helps.

christopher-beckham commented 7 months ago

For this error:

Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file

You need to have CUDA 11.0 installed and its directory set in $LD_LIBRARY_PATH. I'm not sure what version you have already since I see that there is mention of /usr/local/cuda/.

harish2sista commented 7 months ago

Hi, @christopher-beckham. I am able to get design-bench installed with GPU support on conda env using Python-3.7. This is my packages list https://gist.github.com/harish2sista/715ace794e2ff2f590d3cc7790b4fcf5

I have installed cudatoolkit and cudnn using the nvidia channel through conda.

I have used the design_bench_data directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?

FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'

christopher-beckham commented 7 months ago

Hi,

As I said, my design_bench_data folder I uploaded is incomplete. I would wait until Brandon gets those links fixed.

On Sun, Jan 21, 2024 at 15:55 Harish Sista @.***> wrote:

Hi, @christopher-beckham https://github.com/christopher-beckham. I am able to get design-bench installed with GPU support on conda env using Python-3.7. This is my package list

I have installed cudatoolkit and cudnn using the nvidia channel through conda.

I have used the design_bench_data directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?

FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'

— Reply to this email directly, view it on GitHub https://github.com/brandontrabucco/design-bench/issues/11#issuecomment-1902760825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASOJABDJXG2NT2VVRUA45DYPV6FHAVCNFSM6AAAAAAXMKTZV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSG43DAOBSGU . You are receiving this because you were mentioned.Message ID: @.***>

christopher-beckham commented 7 months ago

This is a much more complete version of design_bench_data but FYI it is not everything:

https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing

It contains the following data for:

ant_morphology
chembl-Potency-CHEMBL1794345
cifar_nas
dkitty_morphology
hopper_controller
nas_bench
superconductor
tf_bind_10-pho4
tf_bind_8-SIX6_REF_R1
toy_continuous
toy_discrete
utr
harish2sista commented 7 months ago

Hi @christopher-beckham thanks for the update, I am able to use design-bench on my CPU.

Btw, this directory https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing does not have the smiles_vocab.txt file. I copied it separately from https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link

harish2sista commented 7 months ago

Also, another question I have is does deign-bench only work with torch==1.7.1? Can I use any newer versions?

christopher-beckham commented 7 months ago

Can I use any newer versions?

The short answer is probably, because the PyTorch API is pretty stable at this point. But it depends on what you're doing... almost all approximating oracles in this library use Tensorflow. I see one oracle using PyTorch and I doubt that a newer version of PyTorch is going to break it.

To be clear, Design Bench itself is framework "agnostic", but that gets violated as soon as you need to use a specific oracle which was trained on a specific library.

When I finish my fork of the library (which I might just put as a separate "development" branch on this repository) it should be more clear that this is the case. Right now when you run import design_bench it tries to import everything at once and it doesn't make it clear what exactly is needed in what case.

brandontrabucco commented 7 months ago

Hi @seminumber, Thanks for bringing this up. I did more deep research following this solution. It seems the entire design-bench server is not working!

@brandontrabucco, could you please help us resolve this issue?

Thanks all for diagnosing this, locating the broken server, and for sharing your own cached data while I work on fixing the repo! Especially christopher-beckham for the data, and for answering so many of the questions that have arisen because of the broken server!

I'm migrating design-bench away from GCP, and I'll add a section to the README with instructions on how the data for each task in design-bench was generated. I'm also working on fixes for bugs the community has found.

brandontrabucco commented 7 months ago

This google drive folder contains additional raw data for design-bench tasks, including the smiles_vocab.txt that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link

brandontrabucco commented 7 months ago

Hi, @christopher-beckham. I am able to get design-bench installed with GPU support on conda env using Python-3.7. This is my packages list https://gist.github.com/harish2sista/715ace794e2ff2f590d3cc7790b4fcf5

I have installed cudatoolkit and cudnn using the nvidia channel through conda.

I have used the design_bench_data directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?

FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'

I'm working on tracking down a copy of design_bench_data/gfp and I'll update this thread when I locate it.

harish2sista commented 7 months ago

This google drive folder contains additional raw data for design-bench tasks, including the smiles_vocab.txt that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link

Hi, @brandontrabucco, thanks for sharing the design_bench_data file. I have tried testing it; it seems like all the datasets are missing some files. The only dataset I got to working is the semiconductor dataset, which I have copied from https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing

Also, I tried testing some tasks, and it seems like none of them are working because of the missing files. Could you please look into this too?

harish2sista commented 7 months ago

This google drive folder contains additional raw data for design-bench tasks, including the smiles_vocab.txt that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link

Also, downloading this directory from Google Drive as .zip will split it into multiple directories. I have directly downloaded the contents into my local directory using this solution Fast and Easy ways to Download Large Google Drive Files or Folders

Try downloading using:

pip install gdown
mkdir design_bench_data
gdown https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link -O /design_bench_data --folder --remaining-ok
christopher-beckham commented 4 months ago

I have a branch which is a temporary solution in chris/fixes, basically when you import design_bench it will run the following code:

from huggingface_hub import snapshot_download 

DB_HF_DATASET = os.environ.get("DB_HF_REPO", "beckhamc/design_bench_data")
DB_DATA_DIR = os.path.join(
    os.path.dirname(os.path.dirname(__file__)),
    "design_bench_data"
)
snapshot_download(DB_HF_DATASET, repo_type="dataset", local_dir=DB_DATA_DIR)

i.e. it will mass-download the entire data into the data directory. It does require huggingface_hub, I think this comes with just installing HF's datasets via pip install datasets.

This is what I suggest you do for now, until I refactor the code so that DiskResource gets replaced with an HF-friendly equivalent.