ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
220 stars 147 forks source link

🦠 Model Request: Scaffold-Morphing #940

Closed Inyrkz closed 7 months ago

Inyrkz commented 10 months ago

Model Name

Scaffold Morphing

Model Description

The context discusses a novel notation system called Sequential Attachment-based Fragment Embedding (SAFE) that improves upon traditional molecular string representations like SMILES. SAFE reframes SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. This streamlines complex molecular design tasks by facilitating autoregressive generation under various constraints. The effectiveness of SAFE is demonstrated by training a GPT2-like model on a dataset of 1.1 billion SAFE representations that exhibited versatile and robust optimization performance for molecular design.

In scaffold morphing, we wish to replace a scaffold by another one in a molecule. The process requires as input that the user provides either the side chains or the input molecules and the core

Slug

safe-scaffold-morphing

Tag

Compound Generation

Publication

https://arxiv.org/pdf/2310.10773.pdf

Source Code

https://github.com/datamol-io/safe/tree/main

License

CC BY 4.0

Inyrkz commented 9 months ago

@GemmaTuron, @miquelduranfrigola

The results are promising. Filtering by the molecular weight will help a lot. It seems the range of 70-80 is good.

Now, this is the question the question remains of whether we want only 1 core per molecule or we would pass up to three cores per molecule

GemmaTuron commented 9 months ago

Hi @Inyrkz !

Great, I suggest being a bit less restrictive, molecular weight 60 -100 And then, maybe we can select the core with less heteroatoms (less Carbons) so that is a more interesting scaffold. With this, we could proceed, in the output of the model by the way we should give the core that we have used as output as well

Inyrkz commented 9 months ago
Inyrkz commented 9 months ago

This result is promising.

One last step. We need to decide which scaffold we want to use, probably based on the number of attachment points. Or we could work with all of them.

miquelduranfrigola commented 9 months ago

Awesome stuff, @Inyrkz . Results are indeed very promising.

A few considerations:

Thanks again!

Inyrkz commented 9 months ago

@miquelduranfrigola, I have added another condition to get the scaffold that is closer to the range of 60 - 100 as a fallback strategy. The generic scaffolds have been excluded.

I'll be bringing all the functions together to do the scaffold morphing.

Inyrkz commented 9 months ago

This notebook shows what running the main.py file will look like with these as the input SMILES

smiles
CC(C)(C)c1nc(c(s1)-c1ccnc(N)n1)-c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F
CN(C)c1cccc(c1)C(=O)Nc1ccc(C)c(NC(=O)c2ccc(O)cc2)c1
Cc1ccc(Cl)c(Nc2ccccc2C(O)=O)c1Cl
N[C@@H](Cc1ccc(O)c(O)c1)C(O)=O

I'll create another notebook to show the result of only one SMILES as input.

miquelduranfrigola commented 9 months ago

OK @Inyrkz this is going in the right direction. Could you add a method to keep, for each molecule, only unique molecules? I see repeated molecules in your previous notebook. Getting unique molecules can be done via 1. indexing them with InChIKeys and 2. getting the unique set.

Inyrkz commented 9 months ago

@miquelduranfrigola, Thanks for catching the repetition. I'll address it.

Inyrkz commented 9 months ago

I've removed the duplicates. Now we only have 7 outputs for the 4 input SMILES.

The model fails on some SMILES input. In this notebook, I only use one SMILES as input, the model couldn't generate any new molecule for each of the four core structures extracted.

I even tried converting the SMILES to a SAFE before passing it to the model. It didn't work.

I'm not sure why.

miquelduranfrigola commented 9 months ago

Thanks @Inyrkz - I will investigate it in detail in preparation for our meeting tomorrow. Meanwhile, have you already prepared the Dockerfile, run.sh etc for this model?

Inyrkz commented 9 months ago

I haven't prepared the Dockerfile yet. I'll work on that.

Inyrkz commented 9 months ago

I’ve adjusted the get_side_chain_pairs function. It gives different pairs of side chains. I tested the code on this molecule CC(C)(C)c1nc(c(s1)-c1ccnc(N)n1)-c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F (one of the molecules that wasn’t working before.)

Not all the side chains combo works, only a few do. This approach is better so that we generate at least one side chain that works.

Here is the notebook.

It takes about 7 minutes to generate new molecules for one SMILES.

I want to try all four SMILES.

miquelduranfrigola commented 9 months ago

Thanks @Inyrkz

The bit of information saying that "not all side chains combo works" is relevant. Actually, we might even want to try to do this with 1 side chain. So, in general, instead of generating pairs, we might want to generate all combination of, let's say, up to 3 elements.

As an example, consider the following, where "a", "b", "c", "d" would be four side chains.

Here are the combinations for the example list ['a', 'b', 'c', 'd']:

Combinations of 1 element:

('a',) ('b',) ('c',) ('d',) Combinations of 2 elements:

('a', 'b') ('a', 'c') ('a', 'd') ('b', 'c') ('b', 'd') ('c', 'd') Combinations of 3 elements:

('a', 'b', 'c') ('a', 'b', 'd') ('a', 'c', 'd') ('b', 'c', 'd')

This demonstrates how to generate all possible combinations of 1, 2, and 3 elements from a given list. You can use the same approach for any list of elements.

This code would generate all combinations

from itertools import combinations

# Example list
example_list = ['a', 'b', 'c', 'd']

# Generating all combinations of 1, 2, and 3 elements
combinations_1 = list(combinations(example_list, 1))
combinations_2 = list(combinations(example_list, 2))
combinations_3 = list(combinations(example_list, 3))

all_combinations = combinations_1 + combinations_2 + combinations_3

What do you think?

Inyrkz commented 9 months ago

This is great. Thanks for the code sample.

I can try this. The only problem is it may take longer for the model to run.

Inyrkz commented 9 months ago

@miquelduranfrigola, I've adjusted the get_side_chain_pairs() function to use all combination.

The problem is that the scaffold-morphing model gives this error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[53], [line 1](vscode-notebook-cell:?execution_count=53&line=1)
----> [1](vscode-notebook-cell:?execution_count=53&line=1) generated_smiles = designer.scaffold_morphing(
      [2](vscode-notebook-cell:?execution_count=53&line=2)     side_chains=side_chain_pairs[0],
      [3](vscode-notebook-cell:?execution_count=53&line=3)     n_samples_per_trial=12,
      [4](vscode-notebook-cell:?execution_count=53&line=4)     n_trials=1,
      [5](vscode-notebook-cell:?execution_count=53&line=5)     sanitize=True,
      [6](vscode-notebook-cell:?execution_count=53&line=6)     do_not_fragment_further=False,
      [7](vscode-notebook-cell:?execution_count=53&line=7)     random_seed=100,
      [8](vscode-notebook-cell:?execution_count=53&line=8)     )
     [10](vscode-notebook-cell:?execution_count=53&line=10) print(generated_smiles)

File [~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:172](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:172), in SAFEDesign.scaffold_morphing(self, side_chains, mol, core, n_samples_per_trial, n_trials, sanitize, do_not_fragment_further, random_seed, **kwargs)
    [137](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:137) def scaffold_morphing(
    [138](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:138)     self,
    [139](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:139)     side_chains: Optional[Union[dm.Mol, str, List[Union[str, dm.Mol]]]] = None,
   (...)
    [147](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:147)     **kwargs,
    [148](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:148) ):
    [149](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:149)     """Perform scaffold morphing decoration using the pretrained SAFE model
    [150](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:150) 
    [151](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:151)     For scaffold morphing, we try to replace the core by a new one. If the side_chains are provided, we use them.
   (...)
    [169](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:169)         kwargs: any argument to provide to the underlying generation function
...
--> [316](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:316)     raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
    [318](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:318) # Non-unit step argument supplied.
    [319](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:319) istep = int(step)

ValueError: empty range for randrange() (1, 1, 0)
Inyrkz commented 9 months ago

@miquelduranfrigola ,

This is the input of the get_side_chain_pairs() function.

[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C.[3*]c1ccnc(N)n1

This is the output.

['[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F', '[1*]C(C)(C)C', '[1*]c1ccnc(N)n1', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]c1ccnc(N)n1', '[1*]C(C)(C)C.[2*]c1ccnc(N)n1', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C.[3*]c1ccnc(N)n1']

I've noticed that the model crashes for side chains like this [1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F', '[1*]C(C)(C)C', '[1*]c1ccnc(N)n1',

It has to be at least a pair

Inyrkz commented 9 months ago

It takes 4 minutes for the model to run predictions for 4 input SMILES on Google Colab (GPU)

miquelduranfrigola commented 9 months ago

OK @Inyrkz thanks - GPU is a bit faster then. Noted. Let's then do pairs and triplets. One question: does the n_trials parameter help in generating more molecules? Another question: does reducing the number of molecules per trial increase speed?

Inyrkz commented 9 months ago

Okay, pairs and triplets. So I just do

all_combinations = combinations_2 + combinations_3

I'll experiment to get answers to the questions.

Inyrkz commented 9 months ago

Here's what I've observed so far (I'll update this list).

I hope this helps.

miquelduranfrigola commented 9 months ago

Thanks @Inyrkz

This is useful. We will have to accept that this is a slow model. Let's do n_trials=10 and n_samples_per_trial=10, if you agree? Can we try this config for a few molecules, keeping an eye on:

Almost there!

Inyrkz commented 9 months ago

Alright,

I'll try n_trials=10 and n_samples_per_trial=10.

Yup, almost there!

Inyrkz commented 9 months ago

This table shows how long it takes to execute the original code.

Tasks Execution Time on my System (mins) Execution Time on Colab with GPU (mins)
Side chain 1 3m46s 1m46s
Side chain 2 1m56s 1m34s
Side chain 3 2m2s 1m23s
Side chain 4 3m33s 1m29s
Total time 24 mins (using a loop) 6 mins (using a loop)
Average generation time per side chain 2m49s 1m33s
Inyrkz commented 9 months ago

Here is the notebook

Inyrkz commented 9 months ago
Input SMILES No. of core structures generated No. of side chains pairs Execution Time on Colab with GPU (sec) Number of output generated Notebook
CC(C)(C)c1nc(c(s1)-c1ccnc(N)n1)-c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F 4 13 1296 224 link
CN(C)c1cccc(c1)C(=O)Nc1ccc(C)c(NC(=O)c2ccc(O)cc2)c1 2 5 563 200 link
miquelduranfrigola commented 9 months ago

Thanks, @Inyrkz - these numbers look very reasonable; ~200 molecules generated is sufficient. Also, on a quick look, I like the molecules generated according to the notebooks. In my opinion, we are ready to wrap up - @GemmaTuron , what do you think?

Inyrkz commented 9 months ago

@GemmaTuron

I've made an initial update to the main.py script. These are the two files main.py and mol_gen.py

GemmaTuron commented 9 months ago

Hi @Inyrkz !

Thanks for this, it looks good, does it work fine with Ersilia? I cannot have a deep dive into it this afternoon but will do tomorrow, thanks for the work.

Inyrkz commented 9 months ago

@GemmaTuron, You're welcome.

I'm yet to test it with Ersilia. I want to make sure the code is okay before using it in my presentation tomorrow.

It took over 30 mins to run it on my system.

Inyrkz commented 9 months ago

The run.sh file works.

I've been trying to test it with Ersilia but I keep getting a PingError. My internet connection is bad. I'll try again at night to see if it gets better.

Git LFS initialized.
17:02:22 | DEBUG    | Git LFS has been activated
17:02:47 | ERROR    | Ersilia exception class:
PingError

Detailed error:
No internet connection. Internet connection is required for downloading models from GitHub repositories.

Hints:
Make sure that your computer is connected to the internet and try again. 

🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Ersilia exception class:
PingError

Detailed error:
No internet connection. Internet connection is required for downloading models from GitHub repositories.

Hints:
Make sure that your computer is connected to the internet and try again. 

If this error message is not helpful, open an issue at:
 - https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
 - hello[at]ersilia.io

If you haven't, try to run your command in verbose mode (-v in the CLI)
 - You will find the console log file in: /home/affiah/eos/current.log
miquelduranfrigola commented 9 months ago

Thanks @Inyrkz let us know when you have better connection.

GemmaTuron commented 9 months ago

@Inyrkz

I've had a look at the code and it seems fine, to be able to run it within Ersilia you'll need a few edits: 1) add the metadata information 2) Solve the paths issue to files? Though I would not hardcode these, now that we are already at implementation stage (I am talking about the input and output files in main.py)

GemmaTuron commented 9 months ago

Also, line 96 on mol_gen, spotted a typo: core_structures = self.extract_core_structure(i) should be core_structures = self._extract_core_structure(i) and other functions as well!

Inyrkz commented 9 months ago

Thanks for the update @GemmaTuron & @miquelduranfrigola

I corrected the typos in the mol_gen.py script.

I'm a bit confused about the path issue. The code will only require the input and output files in the main.py. It won't require any other file.

input_file = "data/my_molecules (copy).csv"
output_file = "data/results.csv"

The lines above were just for testing. This is what the main code will look like.

input_file = sys.argv[1]
output_file = sys.argv[2]

Is there any path issue to modify here?

Inyrkz commented 9 months ago

Also, my system freezes for hours when testing with ersilia. This usually happens when it gets to the Attempting to delete BentoML part.

I'm trying to set up ersilia on another system, so I can do the testing with the system.

GemmaTuron commented 9 months ago

Thanks for the update @GemmaTuron & @miquelduranfrigola

I corrected the typos in the mol_gen.py script.

I'm a bit confused about the path issue. The code will only require the input and output files in the main.py. It won't require any other file.

input_file = "data/my_molecules (copy).csv"
output_file = "data/results.csv"

The lines above were just for testing. This is what the main code will look like.

input_file = sys.argv[1]
output_file = sys.argv[2]

Is there any path issue to modify here?

When I was testing with the specified files instead of the argv inputed files, it was having issues with the paths, that's all!

Inyrkz commented 9 months ago

I'm stuck here

16:57:29 | DEBUG    | Activation done
16:57:29 | DEBUG    | Previous command successfully run inside eos8bhe conda environment
16:57:29 | DEBUG    | Now trying to establish symlinks
16:57:29 | DEBUG    | BentoML location is /Users/ini-abasiaffiah/bentoml/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG    | Ersilia Bento location is /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG    | Building symlinks between /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB and /Users/ini-abasiaffiah/bentoml/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG    | Creating model symlink bundle artifacts > dest
16:57:29 | DEBUG    | Creating model_install_commands.sh symlink dest <> bundle
16:57:29 | INFO     | Could not create symbolic link from /Users/ini-abasiaffiah/eos/dest/eos8bhe/data.h5 to /Users/ini-abasiaffiah/eos/isaura/lake/eos8bhe_public.h5
16:57:29 | DEBUG    | Run file found in framework: /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB/eos8bhe/artifacts/framework/run.sh
16:57:29 | DEBUG    | Run commandlines on eos8bhe
16:57:29 | DEBUG    | which python > /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-i4oy4n5b/tmp.txt
16:57:30 | DEBUG    | Activating base environment
16:57:30 | DEBUG    | Current working directory: /Users/ini-abasiaffiah/ersilia
16:57:30 | DEBUG    | Running bash /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-fl8xyer9/script.sh 2>&1 | tee -a /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-w754m208/command_outputs.log
# conda environments:
#
base                     /Users/ini-abasiaffiah/anaconda3
eos8bhe               *  /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe
eosbase-bentoml-0.11.0-py310     /Users/ini-abasiaffiah/anaconda3/envs/eosbase-bentoml-0.11.0-py310
ersilia                  /Users/ini-abasiaffiah/anaconda3/envs/ersilia

16:57:30 | DEBUG    | # conda environments:
#
base                     /Users/ini-abasiaffiah/anaconda3
eos8bhe               *  /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe
eosbase-bentoml-0.11.0-py310     /Users/ini-abasiaffiah/anaconda3/envs/eosbase-bentoml-0.11.0-py310
ersilia                  /Users/ini-abasiaffiah/anaconda3/envs/ersilia

16:57:30 | DEBUG    | Activation done
16:57:30 | DEBUG    | Python executable: /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe/bin/python
16:57:30 | DEBUG    | Conda is needed
16:57:30 | DEBUG    | Checking if model needs to be integrated to a tool
Traceback (most recent call last):
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 468, in _make_request
    self._validate_conn(conn)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1097, in _validate_conn
    conn.connect()
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connection.py", line 642, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connection.py", line 783, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 471, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 515, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 1104, in _create
    self.do_handshake()
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 1375, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 60] Operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 492, in _make_request
    raise new_e
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 470, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 371, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/bin/ersilia", line 8, in <module>
    sys.exit(cli())
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/__init__.py", line 22, in wrapper
    return func(*args, **kwargs)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/fetch.py", line 89, in fetch
    _fetch(mf, model_id)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/fetch.py", line 12, in _fetch
    mf.fetch(model_id)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 228, in fetch
    self._fetch(model_id)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 225, in _fetch
    self._fetch_not_from_dockerhub(model_id=model_id)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 137, in _fetch_not_from_dockerhub
    self._content()
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 106, in _content
    cg = CardGetter(self.model_id, self.config_json)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/actions/content.py", line 14, in __init__
    self.mc = ModelCard(config_json=config_json)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/content/card.py", line 738, in __init__
    self.ac = AirtableCard(config_json=config_json)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/content/card.py", line 676, in __init__
    AirtableInterface.__init__(self, config_json=config_json)
  File "/Users/ini-abasiaffiah/ersilia/ersilia/db/hubdata/interfaces.py", line 13, in __init__
    self.api_key = self._get_read_only_airtable_api_key()
  File "/Users/ini-abasiaffiah/ersilia/ersilia/db/hubdata/interfaces.py", line 24, in _get_read_only_airtable_api_key
    r = requests.get(url)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=None)
Inyrkz commented 9 months ago

Is there a way to test with ersilia on codespace?

GemmaTuron commented 9 months ago

Hi @Inyrkz

You can build a codespace from your repo and try run.sh, but I assume you already did this and it worked?

Inyrkz commented 9 months ago

Yes, the run.sh file works well.

GemmaTuron commented 9 months ago

ok Im running it and I'll let you know the outcome

Inyrkz commented 9 months ago

Alright

GemmaTuron commented 9 months ago

@Inyrkz

Please:

Then, to solve the wandb dependency issue, install safe-mol without dependencies and manually install each package. Specify the versions you got when installing safe outside ersilia

Inyrkz commented 9 months ago

@GemmaTuron, this is what the metadata.json file looks like. Is this okay?

{
    "Identifier": "eos8bhe",
    "Slug": "scaffold-morphing",
    "Status": "In progress",
    "Title": "safe",
    "Description": "The context discusses a novel notation system called Sequential Attachment-based Fragment Embedding (SAFE) that improves upon traditional molecular string representations like SMILES. SAFE reframes SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. This streamlines complex molecular design tasks by facilitating autoregressive generation under various constraints. The effectiveness of SAFE is demonstrated by training a GPT2-like model on a dataset of 1.1 billion SAFE representations that exhibited versatile and robust optimization performance for molecular design.",
    "Mode": "Pretrained",
    "Task": ["Generation"],
    "Input": ["Compound"],
    "Input Shape": "Single",
    "Output": ["Compound"],
    "Output Type": ["String"],
    "Output Shape": "List",
    "Interpretation": "Model generates new molecules from input molecule by replacing core structures of input molecule.",
    "Tag": "Compound Generation",
    "Publication": "https://arxiv.org/pdf/2310.10773.pdf",
    "Source Code": "https://github.com/datamol-io/safe/tree/main",
    "License": "CC BY 4.0"
}
Inyrkz commented 9 months ago

I want to install safe-mol==0.1.4 without any dependency

These are the other packages from the pyproject.toml file

keywords = ["safe", "smiles", "de novo", "design", "molecules"]
dependencies = [
    "tqdm",
    "loguru",
    "typer",
    "universal_pathlib",
    "datamol",
    "numpy",
    "torch>=2.0",
    "transformers",
    "datasets",
    "tokenizers",
    "accelerate",
    "evaluate",
    "wandb",
    "huggingface-hub",
    "rdkit"
]
GemmaTuron commented 8 months ago

Hi @Inyrkz !

great that it works now, can you open a PR?

Inyrkz commented 8 months ago

@GemmaTuron,

I've opened a pull request.

GemmaTuron commented 8 months ago

@Inyrkz can you check why the docker upload is failing currently?

Inyrkz commented 8 months ago

Okay, how do I check that?