Open erfanhamdi opened 1 year ago
What version of transformers
are you using? I'm assuming maybe it's a version mismatch since the error mentions BertTokenizer
.
When I set up design bench I try (as much as I can) to use the dependencies listed in design-baselines
:
https://github.com/brandontrabucco/design-baselines/blob/master/requirements.txt
Hi!
I'm facing the same issue. I have strictly installed all the package versions from requirements.txt.
But the issue still persists. Does anyone have any other recommendations?
@harish2sista can you list the error that you get? Is it the same as the original poster's error message?
In my own experience setting this up there are some very specific version requirements for some of the packages, and it's not guaranteed to work out of the box when you install from requirements.txt
. There were certain packages that I had to install with pip install <module>==version --no-deps
so that it didn't pull in newer version numbers for other dependencies (i.e. emphasis on --no-deps
).
You can check my own working requirements.txt
here: https://gist.github.com/christopher-beckham/d1b319e84ae359e671b67e5fa7c663da
For example if you suspect the issue is with the transformers
module you can try the version from my requirements.txt file:
pip install transformers==3.5.1 --no-deps
However you may also have to do the same thing for any extra dependencies of transformers
.
@christopher-beckham Thanks for the swift response. The error message I received is
File "/home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/transformers/tokenization_bert.py", line 196, in __init__ "model use 'tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)'".format(vocab_file) ValueError: Can't find a vocabulary file at path '/home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data/smiles_vocab.txt'. To load the vocabulary from a Google pretrained model use 'tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)'
//replaced my
And thank you for sending your requirements.txt
file. Are you using python >= 3.8
? because the conda env create -f design-baselines/environment.yml
code installs python3.7
. I received Python version incompatibility issue with your file.
Also, after investigation I found that the /home/<user_name>/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench_data
directory is empty. I don't know if this has anything to do with the root library.
I am using Python 3.8 yes. If you could try with that it would be good.
I never tried Python 3.7 but my first attempt at the environment was with 3.9 and I didn't get it to work. For 3.9 there were certain modules (wheels) which were compiled for 3.8 but not 3.9, and so I had to downgrade because of pip complaining it couldn't find anything. Again, this seems to be a consequence of the fact that Design Bench requires some older library versions.
I am currently working on a fork which fixes some of these issues and is much more forgiving with dependencies. If you think it might help you, I could try work on it a bit more this weekend and make it public. Thanks.
@christopher-beckham Thank you very much!! That would be really helpful! In the meantime, I will try the installation with Python 3.8. I'll let you know if I come across any new issues. I'm very excited about your repo! I'm looking forward to it.
Hi, which Mujoco-py version are you using?
I'm using the latest version I got from the Github. Here are some instructions I wrote for myself, your mileage may vary:
design-baselines
appears to be using version mujoco-py==2.0.2.3
, but I had trouble trying to install this. I could only install a slightly newer version by following one of the replies in https://github.com/openai/mujoco-py/issues/773 which suggests to do the following:
compiler_directives={'legacy_implicit_noexcept': True}
to the line 239 of mujoco_py/builder.py
Cython
to the latest version (use pip install / upgrade or whatever)pip install .
inside the repo to install.This worked for me so long as I exported my LD_LIBRARY_PATH
to be the following (note that the compilation log will tell you if something is missing in this path which you need to add):
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:<path to mujoco>/mujoco210/bin:/usr/lib/nvidia
The last issue involved something to do with some missing libs which need to be installed. While this can be easily done with root privileges and apt-get
, you can also install them with conda by following https://github.com/openai/mujoco-py/issues/627#issuecomment-1007658905
conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3
export CPATH=$CONDA_PREFIX/include
pip install patchelf
I'm facing the same issue. If I'm correct, it seems that the file on the web 'https://storage.googleapis.com/design-bench/smiles_vocab.txt' is missing. It is defined in morgan_fingerprint_features.py:67
and onward:
vocab_file = DiskResource(
os.path.join(DATA_DIR, 'smiles_vocab.txt'),
download_method="direct",
download_target=f'{SERVER_URL}/smiles_vocab.txt')
This code seems to cache a file on the web, the link to the file seems broken now.
Hi @seminumber, Thanks for bringing this up. I did more deep research following this solution. It seems the entire design-bench server is not working!
@brandontrabucco, could you please help us resolve this issue?
Hmm this is concerning. I have smiles_vocab locally, maybe I can upload it somewhere for now for you. I could maybe modify the code on my fork so that it will just pull the file from my Google Drive.
(For context, I am working on my own fork which will not import everything at once. Dataset specific imports should only happen when those datasets get created. Right now in order to import design_bench you need to have satisfied every dataset's dependencies.)
@christopher-beckham, Thank you for the help! Could you please post the link to the smiles_vocab file after you upload it?
Looking forward to your fork; thank you!
You can get the data here: https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link
I don't know if it has all the data, because I only imported certain things like the RL environments.
Extract the contents to <path to design bench repo>/design-bench/design_bench_data
.
I could look at my fork and maybe specify an alternative which downloads it from my gdrive.
Brandon is currently on vacation but I did recently inform him about this so hopefully it will be an easy fix from his end. Unfortunately I am suffering from this issue myself since my design_bench_data
directory doesn't contain everything, only the stuff for Ant/Kitty/Superconductor.
Hi @christopher-beckham, I almost got the package to work with Mujoco_py==2.1.2.14
. The only dependencies I'm missing are tensorflow, torch_geometric, pytorch_lightning, and Jax
. Could you please tell me which versions of these packages I should install?
You can get the data here: https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link
I don't know if it has all the data, because I only imported certain things like the RL environments.
Extract the contents to
<path to design bench repo>/design-bench/design_bench_data
.I could look at my fork and maybe specify an alternative which downloads it from my gdrive.
Thank you for sharing the design_bench_data
directory, adding this to the ~/miniconda/envs/<env_name>/lib/python<version>/site-packages/
resolved the missing vocab file error.
I don't know about Tensorflow GPU, personally I use PyTorch and so when import tensorflow
complains about not finding a GPU (or missing a lib) I just ignore it. I also had a Jax import warning and I just ignored it. It's not clear to me whether it's some extraneous dependency that Tensorflow needs (but which has nothing to do with Design Bench).
At this point just make sure that you can import design_bench
without it failing. If other issues persist we can talk about specifically. Thanks.
Actually, that is causing an issue. I am unable to run the [Reproducing Baseline Performance](https://github.com/brandontrabucco/design-bench?tab=readme-ov-file#reproducing-baseline-performance)
.
I am getting this repetitive error
2024-01-16 15:09:35.904815: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/home/<user-name>/.mujoco/mujoco210/bin:/usr/lib/nvidia/lib64:/usr/lib/nvidia
There is not much I can do to help from memory (I don't use TF), but you can try to find it (or download it) and insert it into your LD_LIBRARY_PATH and see if that helps.
For this error:
Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file
You need to have CUDA 11.0 installed and its directory set in $LD_LIBRARY_PATH
. I'm not sure what version you have already since I see that there is mention of /usr/local/cuda/
.
Hi, @christopher-beckham. I am able to get design-bench
installed with GPU support on conda env using Python-3.7
. This is my packages list https://gist.github.com/harish2sista/715ace794e2ff2f590d3cc7790b4fcf5
I have installed cudatoolkit
and cudnn
using the nvidia channel through conda.
I have used the design_bench_data
directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?
FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'
Hi,
As I said, my design_bench_data folder I uploaded is incomplete. I would wait until Brandon gets those links fixed.
On Sun, Jan 21, 2024 at 15:55 Harish Sista @.***> wrote:
Hi, @christopher-beckham https://github.com/christopher-beckham. I am able to get design-bench installed with GPU support on conda env using Python-3.7. This is my package list
I have installed cudatoolkit and cudnn using the nvidia channel through conda.
I have used the design_bench_data directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?
FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'
— Reply to this email directly, view it on GitHub https://github.com/brandontrabucco/design-bench/issues/11#issuecomment-1902760825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASOJABDJXG2NT2VVRUA45DYPV6FHAVCNFSM6AAAAAAXMKTZV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSG43DAOBSGU . You are receiving this because you were mentioned.Message ID: @.***>
This is a much more complete version of design_bench_data
but FYI it is not everything:
https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing
It contains the following data for:
ant_morphology
chembl-Potency-CHEMBL1794345
cifar_nas
dkitty_morphology
hopper_controller
nas_bench
superconductor
tf_bind_10-pho4
tf_bind_8-SIX6_REF_R1
toy_continuous
toy_discrete
utr
Hi @christopher-beckham thanks for the update, I am able to use design-bench on my CPU.
Btw, this directory https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing does not have the smiles_vocab.txt
file. I copied it separately from https://drive.google.com/drive/folders/1tmbFImzhkivZUjHeh434D7V7mrxTBu1H?usp=drive_link
Also, another question I have is does deign-bench
only work with torch==1.7.1
?
Can I use any newer versions?
Can I use any newer versions?
The short answer is probably, because the PyTorch API is pretty stable at this point. But it depends on what you're doing... almost all approximating oracles in this library use Tensorflow. I see one oracle using PyTorch and I doubt that a newer version of PyTorch is going to break it.
To be clear, Design Bench itself is framework "agnostic", but that gets violated as soon as you need to use a specific oracle which was trained on a specific library.
When I finish my fork of the library (which I might just put as a separate "development" branch on this repository) it should be more clear that this is the case. Right now when you run import design_bench
it tries to import everything at once and it doesn't make it clear what exactly is needed in what case.
Hi @seminumber, Thanks for bringing this up. I did more deep research following this solution. It seems the entire design-bench server is not working!
@brandontrabucco, could you please help us resolve this issue?
Thanks all for diagnosing this, locating the broken server, and for sharing your own cached data while I work on fixing the repo! Especially christopher-beckham for the data, and for answering so many of the questions that have arisen because of the broken server!
I'm migrating design-bench
away from GCP, and I'll add a section to the README with instructions on how the data for each task in design-bench
was generated. I'm also working on fixes for bugs the community has found.
This google drive folder contains additional raw data for design-bench tasks, including the smiles_vocab.txt
that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link
Hi, @christopher-beckham. I am able to get
design-bench
installed with GPU support on conda env usingPython-3.7
. This is my packages list https://gist.github.com/harish2sista/715ace794e2ff2f590d3cc7790b4fcf5I have installed
cudatoolkit
andcudnn
using the nvidia channel through conda.I have used the
design_bench_data
directory you have shared before. There is a new error of missing files with this directory; could you please help me with this?FileNotFoundError: [Errno 2] No such file or directory: '/home/harish2sista/miniconda3/envs/MBO/lib/python3.7/site-packages/design_bench_data/gfp/gfp-x-0.npy'
I'm working on tracking down a copy of design_bench_data/gfp
and I'll update this thread when I locate it.
This google drive folder contains additional raw data for design-bench tasks, including the
smiles_vocab.txt
that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link
Hi, @brandontrabucco, thanks for sharing the design_bench_data
file. I have tried testing it; it seems like all the datasets are missing some files. The only dataset I got to working is the semiconductor
dataset, which I have copied from https://drive.google.com/file/d/1RImr1Fw5ImKXX66uOGFGO2jwJQvFG90v/view?usp=sharing
Also, I tried testing some tasks, and it seems like none of them are working because of the missing files. Could you please look into this too?
This google drive folder contains additional raw data for design-bench tasks, including the
smiles_vocab.txt
that was missing: https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link
Also, downloading this directory from Google Drive as .zip
will split it into multiple directories. I have directly downloaded the contents into my local directory using this solution Fast and Easy ways to Download Large Google Drive Files or Folders
Try downloading using:
pip install gdown
mkdir design_bench_data
gdown https://drive.google.com/drive/folders/1FDoM9wWBm7ziWOSyY5V7eE1bx0mXQYwp?usp=drive_link -O /design_bench_data --folder --remaining-ok
I have a branch which is a temporary solution in chris/fixes
, basically when you import design_bench it will run the following code:
from huggingface_hub import snapshot_download
DB_HF_DATASET = os.environ.get("DB_HF_REPO", "beckhamc/design_bench_data")
DB_DATA_DIR = os.path.join(
os.path.dirname(os.path.dirname(__file__)),
"design_bench_data"
)
snapshot_download(DB_HF_DATASET, repo_type="dataset", local_dir=DB_DATA_DIR)
i.e. it will mass-download the entire data into the data directory. It does require huggingface_hub
, I think this comes with just installing HF's datasets via pip install datasets
.
This is what I suggest you do for now, until I refactor the code so that DiskResource
gets replaced with an HF-friendly equivalent.
Hello
I installed the design-bench package the way you mentioned that didn't require MuJoCo and every thing goes fine until when I am trying to import design-bench I get this error.
Is there anything that I am missing? like downloading other files?