Closed Inyrkz closed 7 months ago
@GemmaTuron, @miquelduranfrigola
The results are promising. Filtering by the molecular weight will help a lot. It seems the range of 70-80 is good.
Now, this is the question the question remains of whether we want only 1 core per molecule or we would pass up to three cores per molecule
Hi @Inyrkz !
Great, I suggest being a bit less restrictive, molecular weight 60 -100 And then, maybe we can select the core with less heteroatoms (less Carbons) so that is a more interesting scaffold. With this, we could proceed, in the output of the model by the way we should give the core that we have used as output as well
This result is promising.
One last step. We need to decide which scaffold we want to use, probably based on the number of attachment points. Or we could work with all of them.
Awesome stuff, @Inyrkz . Results are indeed very promising.
A few considerations:
Thanks again!
@miquelduranfrigola, I have added another condition to get the scaffold that is closer to the range of 60 - 100 as a fallback strategy. The generic scaffolds have been excluded.
I'll be bringing all the functions together to do the scaffold morphing.
This notebook shows what running the main.py
file will look like with these as the input SMILES
smiles
CC(C)(C)c1nc(c(s1)-c1ccnc(N)n1)-c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F
CN(C)c1cccc(c1)C(=O)Nc1ccc(C)c(NC(=O)c2ccc(O)cc2)c1
Cc1ccc(Cl)c(Nc2ccccc2C(O)=O)c1Cl
N[C@@H](Cc1ccc(O)c(O)c1)C(O)=O
I'll create another notebook to show the result of only one SMILES as input.
OK @Inyrkz this is going in the right direction. Could you add a method to keep, for each molecule, only unique molecules? I see repeated molecules in your previous notebook. Getting unique molecules can be done via 1. indexing them with InChIKeys and 2. getting the unique set.
@miquelduranfrigola, Thanks for catching the repetition. I'll address it.
I've removed the duplicates. Now we only have 7 outputs for the 4 input SMILES.
The model fails on some SMILES input. In this notebook, I only use one SMILES as input, the model couldn't generate any new molecule for each of the four core structures extracted.
I even tried converting the SMILES to a SAFE before passing it to the model. It didn't work.
I'm not sure why.
Thanks @Inyrkz - I will investigate it in detail in preparation for our meeting tomorrow.
Meanwhile, have you already prepared the Dockerfile
, run.sh
etc for this model?
I haven't prepared the Dockerfile
yet. I'll work on that.
I’ve adjusted the get_side_chain_pairs
function. It gives different pairs of side chains.
I tested the code on this molecule CC(C)(C)c1nc(c(s1)-c1ccnc(N)n1)-c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F
(one of the molecules that wasn’t working before.)
Not all the side chains combo works, only a few do. This approach is better so that we generate at least one side chain that works.
Here is the notebook.
It takes about 7 minutes to generate new molecules for one SMILES.
I want to try all four SMILES.
Thanks @Inyrkz
The bit of information saying that "not all side chains combo works" is relevant. Actually, we might even want to try to do this with 1 side chain. So, in general, instead of generating pairs, we might want to generate all combination of, let's say, up to 3 elements.
As an example, consider the following, where "a", "b", "c", "d" would be four side chains.
Here are the combinations for the example list ['a', 'b', 'c', 'd']:
Combinations of 1 element:
('a',) ('b',) ('c',) ('d',) Combinations of 2 elements:
('a', 'b') ('a', 'c') ('a', 'd') ('b', 'c') ('b', 'd') ('c', 'd') Combinations of 3 elements:
('a', 'b', 'c') ('a', 'b', 'd') ('a', 'c', 'd') ('b', 'c', 'd')
This demonstrates how to generate all possible combinations of 1, 2, and 3 elements from a given list. You can use the same approach for any list of elements.
This code would generate all combinations
from itertools import combinations
# Example list
example_list = ['a', 'b', 'c', 'd']
# Generating all combinations of 1, 2, and 3 elements
combinations_1 = list(combinations(example_list, 1))
combinations_2 = list(combinations(example_list, 2))
combinations_3 = list(combinations(example_list, 3))
all_combinations = combinations_1 + combinations_2 + combinations_3
What do you think?
This is great. Thanks for the code sample.
I can try this. The only problem is it may take longer for the model to run.
@miquelduranfrigola, I've adjusted the get_side_chain_pairs()
function to use all combination.
The problem is that the scaffold-morphing model gives this error.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[53], [line 1](vscode-notebook-cell:?execution_count=53&line=1)
----> [1](vscode-notebook-cell:?execution_count=53&line=1) generated_smiles = designer.scaffold_morphing(
[2](vscode-notebook-cell:?execution_count=53&line=2) side_chains=side_chain_pairs[0],
[3](vscode-notebook-cell:?execution_count=53&line=3) n_samples_per_trial=12,
[4](vscode-notebook-cell:?execution_count=53&line=4) n_trials=1,
[5](vscode-notebook-cell:?execution_count=53&line=5) sanitize=True,
[6](vscode-notebook-cell:?execution_count=53&line=6) do_not_fragment_further=False,
[7](vscode-notebook-cell:?execution_count=53&line=7) random_seed=100,
[8](vscode-notebook-cell:?execution_count=53&line=8) )
[10](vscode-notebook-cell:?execution_count=53&line=10) print(generated_smiles)
File [~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:172](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:172), in SAFEDesign.scaffold_morphing(self, side_chains, mol, core, n_samples_per_trial, n_trials, sanitize, do_not_fragment_further, random_seed, **kwargs)
[137](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:137) def scaffold_morphing(
[138](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:138) self,
[139](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:139) side_chains: Optional[Union[dm.Mol, str, List[Union[str, dm.Mol]]]] = None,
(...)
[147](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:147) **kwargs,
[148](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:148) ):
[149](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:149) """Perform scaffold morphing decoration using the pretrained SAFE model
[150](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:150)
[151](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:151) For scaffold morphing, we try to replace the core by a new one. If the side_chains are provided, we use them.
(...)
[169](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/site-packages/safe/sample.py:169) kwargs: any argument to provide to the underlying generation function
...
--> [316](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:316) raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
[318](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:318) # Non-unit step argument supplied.
[319](https://file+.vscode-resource.vscode-cdn.net/home/affiah/Desktop/eos8bhe/model/framework/code/~/anaconda3/envs/safe/lib/python3.9/random.py:319) istep = int(step)
ValueError: empty range for randrange() (1, 1, 0)
@miquelduranfrigola ,
This is the input of the get_side_chain_pairs()
function.
[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C.[3*]c1ccnc(N)n1
This is the output.
['[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F', '[1*]C(C)(C)C', '[1*]c1ccnc(N)n1', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]c1ccnc(N)n1', '[1*]C(C)(C)C.[2*]c1ccnc(N)n1', '[1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F.[2*]C(C)(C)C.[3*]c1ccnc(N)n1']
I've noticed that the model crashes for side chains like this [1*]c1cccc(NS(=O)(=O)c2c(F)cccc2F)c1F', '[1*]C(C)(C)C', '[1*]c1ccnc(N)n1',
It has to be at least a pair
It takes 4 minutes for the model to run predictions for 4 input SMILES on Google Colab (GPU)
OK @Inyrkz thanks - GPU is a bit faster then. Noted. Let's then do pairs and triplets. One question: does the n_trials parameter help in generating more molecules? Another question: does reducing the number of molecules per trial increase speed?
Okay, pairs and triplets. So I just do
all_combinations = combinations_2 + combinations_3
I'll experiment to get answers to the questions.
Here's what I've observed so far (I'll update this list).
n_samples_per_trial=12
will generate 12 sample outputs when n_trials
is set to 1
. It may give an empty list for some side chains. Maybe it's because the n_trials
is 1n_trials
is set to 5
, for example, we could get up to 60 sample outputs for each side chains passed to the model. This takes longer to run. Also for side chains that return an empty list as out (using the previous parameters), we get an output (non-empty list) when we adjust the n_trials hyperparameter. Instead of an empty list, we get about 6 elements.n_trials
increases the running time.n_trials
will still give an empty list.n_samples_per_trial
. But this mostly depends on the n_trials
. So it will be a trade-off between the n_trials
and n_samples_per_trial
parameters to get more samples or reduce generation time. It will take longer to generate more samples. I hope this helps.
Thanks @Inyrkz
This is useful. We will have to accept that this is a slow model.
Let's do n_trials=10
and n_samples_per_trial=10
, if you agree?
Can we try this config for a few molecules, keeping an eye on:
Almost there!
Alright,
I'll try n_trials=10
and n_samples_per_trial=10
.
Yup, almost there!
This table shows how long it takes to execute the original code.
Tasks | Execution Time on my System (mins) | Execution Time on Colab with GPU (mins) |
---|---|---|
Side chain 1 | 3m46s | 1m46s |
Side chain 2 | 1m56s | 1m34s |
Side chain 3 | 2m2s | 1m23s |
Side chain 4 | 3m33s | 1m29s |
Total time | 24 mins (using a loop) | 6 mins (using a loop) |
Average generation time per side chain | 2m49s | 1m33s |
Thanks, @Inyrkz - these numbers look very reasonable; ~200 molecules generated is sufficient. Also, on a quick look, I like the molecules generated according to the notebooks. In my opinion, we are ready to wrap up - @GemmaTuron , what do you think?
@GemmaTuron
I've made an initial update to the main.py
script.
These are the two files main.py and mol_gen.py
Hi @Inyrkz !
Thanks for this, it looks good, does it work fine with Ersilia? I cannot have a deep dive into it this afternoon but will do tomorrow, thanks for the work.
@GemmaTuron, You're welcome.
I'm yet to test it with Ersilia. I want to make sure the code is okay before using it in my presentation tomorrow.
It took over 30 mins to run it on my system.
The run.sh
file works.
I've been trying to test it with Ersilia but I keep getting a PingError
. My internet connection is bad. I'll try again at night to see if it gets better.
Git LFS initialized.
17:02:22 | DEBUG | Git LFS has been activated
17:02:47 | ERROR | Ersilia exception class:
PingError
Detailed error:
No internet connection. Internet connection is required for downloading models from GitHub repositories.
Hints:
Make sure that your computer is connected to the internet and try again.
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
Error message:
Ersilia exception class:
PingError
Detailed error:
No internet connection. Internet connection is required for downloading models from GitHub repositories.
Hints:
Make sure that your computer is connected to the internet and try again.
If this error message is not helpful, open an issue at:
- https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
- hello[at]ersilia.io
If you haven't, try to run your command in verbose mode (-v in the CLI)
- You will find the console log file in: /home/affiah/eos/current.log
Thanks @Inyrkz let us know when you have better connection.
@Inyrkz
I've had a look at the code and it seems fine, to be able to run it within Ersilia you'll need a few edits: 1) add the metadata information 2) Solve the paths issue to files? Though I would not hardcode these, now that we are already at implementation stage (I am talking about the input and output files in main.py)
Also, line 96 on mol_gen, spotted a typo:
core_structures = self.extract_core_structure(i)
should be core_structures = self._extract_core_structure(i)
and other functions as well!
Thanks for the update @GemmaTuron & @miquelduranfrigola
I corrected the typos in the mol_gen.py
script.
I'm a bit confused about the path issue. The code will only require the input and output files in the main.py
. It won't require any other file.
input_file = "data/my_molecules (copy).csv"
output_file = "data/results.csv"
The lines above were just for testing. This is what the main code will look like.
input_file = sys.argv[1]
output_file = sys.argv[2]
Is there any path issue to modify here?
Also, my system freezes for hours when testing with ersilia. This usually happens when it gets to the Attempting to delete BentoML
part.
I'm trying to set up ersilia on another system, so I can do the testing with the system.
Thanks for the update @GemmaTuron & @miquelduranfrigola
I corrected the typos in the
mol_gen.py
script.I'm a bit confused about the path issue. The code will only require the input and output files in the
main.py
. It won't require any other file.input_file = "data/my_molecules (copy).csv" output_file = "data/results.csv"
The lines above were just for testing. This is what the main code will look like.
input_file = sys.argv[1] output_file = sys.argv[2]
Is there any path issue to modify here?
When I was testing with the specified files instead of the argv inputed files, it was having issues with the paths, that's all!
I'm stuck here
16:57:29 | DEBUG | Activation done
16:57:29 | DEBUG | Previous command successfully run inside eos8bhe conda environment
16:57:29 | DEBUG | Now trying to establish symlinks
16:57:29 | DEBUG | BentoML location is /Users/ini-abasiaffiah/bentoml/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG | Ersilia Bento location is /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG | Building symlinks between /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB and /Users/ini-abasiaffiah/bentoml/repository/eos8bhe/20240212165727_4E41AB
16:57:29 | DEBUG | Creating model symlink bundle artifacts > dest
16:57:29 | DEBUG | Creating model_install_commands.sh symlink dest <> bundle
16:57:29 | INFO | Could not create symbolic link from /Users/ini-abasiaffiah/eos/dest/eos8bhe/data.h5 to /Users/ini-abasiaffiah/eos/isaura/lake/eos8bhe_public.h5
16:57:29 | DEBUG | Run file found in framework: /Users/ini-abasiaffiah/eos/repository/eos8bhe/20240212165727_4E41AB/eos8bhe/artifacts/framework/run.sh
16:57:29 | DEBUG | Run commandlines on eos8bhe
16:57:29 | DEBUG | which python > /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-i4oy4n5b/tmp.txt
16:57:30 | DEBUG | Activating base environment
16:57:30 | DEBUG | Current working directory: /Users/ini-abasiaffiah/ersilia
16:57:30 | DEBUG | Running bash /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-fl8xyer9/script.sh 2>&1 | tee -a /var/folders/md/k05hbtgj6zs3jprxfsjbs1540000gn/T/ersilia-w754m208/command_outputs.log
# conda environments:
#
base /Users/ini-abasiaffiah/anaconda3
eos8bhe * /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe
eosbase-bentoml-0.11.0-py310 /Users/ini-abasiaffiah/anaconda3/envs/eosbase-bentoml-0.11.0-py310
ersilia /Users/ini-abasiaffiah/anaconda3/envs/ersilia
16:57:30 | DEBUG | # conda environments:
#
base /Users/ini-abasiaffiah/anaconda3
eos8bhe * /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe
eosbase-bentoml-0.11.0-py310 /Users/ini-abasiaffiah/anaconda3/envs/eosbase-bentoml-0.11.0-py310
ersilia /Users/ini-abasiaffiah/anaconda3/envs/ersilia
16:57:30 | DEBUG | Activation done
16:57:30 | DEBUG | Python executable: /Users/ini-abasiaffiah/anaconda3/envs/eos8bhe/bin/python
16:57:30 | DEBUG | Conda is needed
16:57:30 | DEBUG | Checking if model needs to be integrated to a tool
Traceback (most recent call last):
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 468, in _make_request
self._validate_conn(conn)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1097, in _validate_conn
conn.connect()
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connection.py", line 783, in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 471, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/ssl_.py", line 515, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 1104, in _create
self.do_handshake()
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/ssl.py", line 1375, in do_handshake
self._sslobj.do_handshake()
TimeoutError: [Errno 60] Operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 845, in urlopen
retries = retries.increment(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
raise value
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 791, in urlopen
response = self._make_request(
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 492, in _make_request
raise new_e
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 470, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/urllib3/connectionpool.py", line 371, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/bin/ersilia", line 8, in <module>
sys.exit(cli())
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/__init__.py", line 22, in wrapper
return func(*args, **kwargs)
File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/fetch.py", line 89, in fetch
_fetch(mf, model_id)
File "/Users/ini-abasiaffiah/ersilia/ersilia/cli/commands/fetch.py", line 12, in _fetch
mf.fetch(model_id)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 228, in fetch
self._fetch(model_id)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 225, in _fetch
self._fetch_not_from_dockerhub(model_id=model_id)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 137, in _fetch_not_from_dockerhub
self._content()
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/fetch.py", line 106, in _content
cg = CardGetter(self.model_id, self.config_json)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/fetch/actions/content.py", line 14, in __init__
self.mc = ModelCard(config_json=config_json)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/content/card.py", line 738, in __init__
self.ac = AirtableCard(config_json=config_json)
File "/Users/ini-abasiaffiah/ersilia/ersilia/hub/content/card.py", line 676, in __init__
AirtableInterface.__init__(self, config_json=config_json)
File "/Users/ini-abasiaffiah/ersilia/ersilia/db/hubdata/interfaces.py", line 13, in __init__
self.api_key = self._get_read_only_airtable_api_key()
File "/Users/ini-abasiaffiah/ersilia/ersilia/db/hubdata/interfaces.py", line 24, in _get_read_only_airtable_api_key
r = requests.get(url)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/Users/ini-abasiaffiah/anaconda3/envs/ersilia/lib/python3.10/site-packages/requests/adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=None)
Is there a way to test with ersilia on codespace?
Hi @Inyrkz
You can build a codespace from your repo and try run.sh, but I assume you already did this and it worked?
Yes, the run.sh file works well.
ok Im running it and I'll let you know the outcome
Alright
@Inyrkz
Please:
Then, to solve the wandb dependency issue, install safe-mol without dependencies and manually install each package. Specify the versions you got when installing safe outside ersilia
@GemmaTuron, this is what the metadata.json
file looks like. Is this okay?
{
"Identifier": "eos8bhe",
"Slug": "scaffold-morphing",
"Status": "In progress",
"Title": "safe",
"Description": "The context discusses a novel notation system called Sequential Attachment-based Fragment Embedding (SAFE) that improves upon traditional molecular string representations like SMILES. SAFE reframes SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. This streamlines complex molecular design tasks by facilitating autoregressive generation under various constraints. The effectiveness of SAFE is demonstrated by training a GPT2-like model on a dataset of 1.1 billion SAFE representations that exhibited versatile and robust optimization performance for molecular design.",
"Mode": "Pretrained",
"Task": ["Generation"],
"Input": ["Compound"],
"Input Shape": "Single",
"Output": ["Compound"],
"Output Type": ["String"],
"Output Shape": "List",
"Interpretation": "Model generates new molecules from input molecule by replacing core structures of input molecule.",
"Tag": "Compound Generation",
"Publication": "https://arxiv.org/pdf/2310.10773.pdf",
"Source Code": "https://github.com/datamol-io/safe/tree/main",
"License": "CC BY 4.0"
}
I want to install safe-mol==0.1.4 without any dependency
These are the other packages from the pyproject.toml
file
keywords = ["safe", "smiles", "de novo", "design", "molecules"]
dependencies = [
"tqdm",
"loguru",
"typer",
"universal_pathlib",
"datamol",
"numpy",
"torch>=2.0",
"transformers",
"datasets",
"tokenizers",
"accelerate",
"evaluate",
"wandb",
"huggingface-hub",
"rdkit"
]
Hi @Inyrkz !
great that it works now, can you open a PR?
@GemmaTuron,
I've opened a pull request.
@Inyrkz can you check why the docker upload is failing currently?
Okay, how do I check that?
Model Name
Scaffold Morphing
Model Description
The context discusses a novel notation system called Sequential Attachment-based Fragment Embedding (SAFE) that improves upon traditional molecular string representations like SMILES. SAFE reframes SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. This streamlines complex molecular design tasks by facilitating autoregressive generation under various constraints. The effectiveness of SAFE is demonstrated by training a GPT2-like model on a dataset of 1.1 billion SAFE representations that exhibited versatile and robust optimization performance for molecular design.
In scaffold morphing, we wish to replace a scaffold by another one in a molecule. The process requires as input that the user provides either the side chains or the input molecules and the core
Slug
safe-scaffold-morphing
Tag
Compound Generation
Publication
https://arxiv.org/pdf/2310.10773.pdf
Source Code
https://github.com/datamol-io/safe/tree/main
License
CC BY 4.0