Open alat-rights opened 3 years ago
@alat-rights I'll defer to @ncfrey but I think that you might possibly have an old version of selfies? I think multiple_selfies_to_hot
was a new utility method @ncfrey added in the most recent version of selfies
Interesting. I ran the import code given by Nathan’s tutorial in Colab. I’ll take a closer look and report back.
EDIT Hey! So I looked over the code and made sure that I was importing the latest version of SELFIES, selfies-1.0.2. I think the issue is something else.
EDIT Hey, update: my friend thinks that the issue might be that 'multiple_selfies_to_hot' isn't actually a documented fn. (I found it in a buried file called data_loader.py in SELFIES). Could it be that the function can't be called because it's not exported? In that case, how did the tutorial run it?
Hey @seyonechithrananda could you take a look here? I could only find multiple_selfies_to_hot in two data_loader.py files, one in examples and one in code from original code. Outside, there is a batch_selfies_to_flat_hot and I tried to call that instead but it behaves differently, apparently.
I was wondering if you could help give me some insight? How should I go about calling multiple_selfies_to_hot? Have there been changes to it in SELFIES since Sept. when @ncfrey wrote the tutorial?
Thank you so much in advance!
EDIT I tried rolling back to v1.0.1 from v1.0.2 and it did not resolve or change the issue.
Hi @alat-rights, thanks for this catch! It looks like they have renamed this function to batch_selfies_to_flat_hot
in selfies.utils
, so the tutorial needs to be updated. If you're running the latest version of selfies
you should be able to modify the function name to get it to work.
https://github.com/aspuru-guzik-group/selfies/blob/64b144e1fc864c1246b8c0c2c7c36ff4696057c2/selfies/utils.py#L198-L202
Hi @ncfrey ! Thanks for the timely reply. I tried to call the new function with the same argvs but I'm getting a TypeError?
EDIT I suspect this to be an issue with batch_selfies_to_flat_hot taking different params.
Here are the params taken by batch_selfies_to_flat_hot:
:param selfies_batch: a list of SELFIES to be converted.
:param vocab_stoi: a dictionary that maps SELFIES symbols (the keys)
to a non-negative index. The indices of the dictionary
must contiguous, starting from 0.
:param pad_to_len: the length that each SELFIES is be padded to.
If ``pad_to_len`` is less than or equal to the symbol
length of the SELFIES, then no padding is added. Defaults to ``-1``.
:return: the flattened one-hot encoded representations of the SELFIES
from the batch. This is a 2D list of size
``(batch_size, N * len(vocab_stoi))``.
The params passed in right now are selfies_list, largest_selfie_len, selfies_alphabet
.
Do you have any ideas for an easy fix? If I'm write the code before it needs to be basically rewritten to fit those new params?
It seems the order of the params switched around, so selfies_list
-> selfies_batch
, largest_selfie_len
-> pad_to_len
, and selfies_alphabet
-> vocab_stoi
. I think the types are the same so hopefully you can reorder the inputs and it should work without actually changing them.
Hi @ncfrey !
It seems like selfies_alphabet and vocab_stoi behave differently; it appears that vocab_stoi takes in single character representations of atoms while selfies_alphabet is a little more complicated (and is also a list as-is), I was wondering if you had any ideas as to how to approach?
Thanks.
Hi! Sorry for not clarifying the above image.
That was the error I receieved after passing dict(zip(selfies_alphabet, range(len(selfies_alphabet)))) as the vocab_stoi parameter to fix the type mismatch issue.
I was wondering if you anyone has insight here? @ncfrey
There is a new example in the SELFIES repo here: https://github.com/aspuru-guzik-group/selfies#label-integer-encoding-selfies that hopefully will be of some help.
It seems that there's an extra step now, after constructing the alphabet you have to do
vocab_stoi = {s: i for i, s in enumerate(alphabet)}
to get vocab_stoi
. Then I think you'll have to specify enc_type='one_hot'
to get one hot encodings used in the rest of the tutorial, rather than integer labels.
My apologies I haven't had a chance to test this out myself, but thanks for taking this on! If you can implement a fix and submit a PR, that would be great. It's probably a good idea to pin the selfies version with pip install selfies==1.0.2
. I think that the version of selfies I was using at the time wasn't a major release, so that's what's causing the dependency issues.
Ok! Should the final call look something like onehots = sf.batch_selfies_to_flat_hot(selfies_list, symbol_to_idx, largest_selfie_len, enc_type='one_hot')
or sholud it actually be a call to selfies_to_encoding
?
enc_type='one_hot'
seems superfluous in a call to batch_selfies_to_flat_hot
, right?
It looks like calling selfies_to_encoding
was the fix! Will update once I get this running and hopefully put it into a PR soon.
I don't feel the need to make a new issue for this, but I've ran into another bug down the line on the tutorial:
There was a list index out of range error here:
I'm having a bit of trouble reading the code. Do you have any idea as to what's happening?
@ncfrey
Hm, fixed that. WIll be in the PR :))
Hey @ncfrey I've been able to resolve all of the errors, but there are still some sticking points and weirder problems that I'm trying to iron out.
Would it be possible for us to meet online sometime soon? My availability looks quite similar, and afternoons/evenings eastern time generally work great for me.
Yeah for sure! I'll send a calendar invite and feel free to adjust the time if needed.
That PR failed horribly, accidentally checked in a Conda installer, but I WILL MAKE ANOTHER ONE, I already have all the necessary changes made on a Colab doc, this issue will be closed soon y'all!
Hey everyone,
I’m modifying @ncfrey ’s Normalizing Flow tutorial to work on the MolNet BBBP dataset, and I’m facing an issue.
The code block that throws the error is:
onehots = sf.multiple_selfies_to_hot(selfies_list, largest_selfie_len, selfies_alphabet)
The error thrown is:
module 'selfies' has no attribute 'multiple_selfies_to_hot'
Was wondering if someone knew why that could be the case?