Closed miquelduranfrigola closed 2 years ago
Hi @miquelduranfrigola,
We currently don’t support multitask optimization in molpal, so setting the number of tasks to anything other than 1 will break the code. How are you currently using the code?
Hi @davidegraff
Thanks for the fast reply! I was interested in the fact that you are using PyTorch Lightning, so my plan was to use the MPNN class of MolPal as a drop-in-replacement for my ChemProp multi-task regression models, which are typically slow. I hope this makes sense?
Thanks! M
yeah so there is a bug right now in molpal.models.mpnmodels.py#L191. This block:
def make_datasets(
self, xs: Iterable[str], ys: Sequence[float]
) -> Tuple[MoleculeDataset, MoleculeDataset]:
"""Split xs and ys into train and validation datasets"""
data = MoleculeDataset([
MoleculeDatapoint(smiles=[x], targets=[y])
for x, y in zip(xs, ys)
])
...
assumes that ys
is an array of single-task target values, so it's a vector of length n
rather than an array of shape n x 1
. If you iterate through an n x m
array and then wrap it in a list (as in targets=[y]
), then the target
of each point is a list of m
floats rather than m
separate targets.
I just fixed this by checking the target shape in MPNN.train(#10). let me know if this problem persists after the latest commit
Hi @davidegraff it works nicely now with the latest commit! Many thanks for your help!
Hi! Thanks for a wonderful repository.
I am trying to train a multitask regression using your MPNN class:
model = MPNN(ncpu=12, num_tasks=2)
I am testing it with a target numpy array of shape (10000, 2)When I run
model.train(smis, targets)
I get the warning:and then the following error, correspondingly:
Is there anything I am doing wrong?
Many thanks!