BorgwardtLab / proteinshake

Protein structure datasets for machine learning.
https://proteinshake.ai
BSD 3-Clause "New" or "Revised" License
101 stars 9 forks source link

Make targets easily accessible for scaling #125

Closed timkucera closed 1 year ago

timkucera commented 1 year ago

also for fitting label binarizers etc. I think this would be best put into the task classes

cgoliver commented 1 year ago

@timkucera do you have more detail on what you mean for this?

timkucera commented 1 year ago

That was from a discussion with Dexiong when he wanted to fit a scaler on the data. For now what you have to do is:

ds = AlphaFoldDataset(...)
proteins, size = ds.proteins()
y = np.array([task.target(protein_dict) for protein_dict in proteins])[task.train_ind]

to get all the target values. Which is ok but rather awkward. It would be good if the targets were easily accessible from the task (i.e. I don't have to call task.target myself but maybe something like a task.y_train property which executes above code). Same goes for fitting binarizers in classification tasks.

cgoliver commented 1 year ago

Thanks for the clarification. I'll take care of that.

timkucera commented 1 year ago

done