apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Is there a way to do multidimensional regression? #2556

Open lunardog opened 5 years ago

lunardog commented 5 years ago

I have a dataset of image embeddings, I'm trying to train a model that will transform 'VisionFeaturePrint_Scene' features into my dataset. What model should i use?

I extracted the VisionFeaturePrint_Scene features and tried this:

model = tc.regression.create(dataset, 
    target='embedding', 
    features=features, 
    validation_set=None)

but the error message is:

ToolkitError: Column type of target 'embedding' must be int or float.

How can I do multidimensional regression with Turicreate?

TobyRoseman commented 5 years ago

TuriCreate does not support multidimensional regression.

It's a good feature request; I'll leave this issue open.

One (poor) workaround would be to train a separate model for each entry in the dataset['embedding'] space.

lunardog commented 5 years ago

Thank you for explaining and for making it a feature request, @TobyRoseman .

carlosdelamora commented 4 years ago

I am a bit confused by this question regression is multidimensional. So if I understand correctly your question might be that you have a feature that is a vector and the target is a vector whose entry should contain a vector? If that is the case you could consider each entry on the vector as new feature, and your target will increase its size as well.

lunardog commented 4 years ago

What I was trying to do is vector input -> vector output, @carlosdelamora. I assumed TuriCreate to have that feature, but I found that it only does vector -> scalar regression at this point.

carlosdelamora commented 4 years ago

The only way that you will get a scalar as output is if you only have one feature. Linear regression output is the vector x such that ||Ax - y|| has the min value. Where A is a matrix given by the dataset and the columns are the features and y is the source of true from the dataset. So you can see that x needs to have as many entries as columns of A or as features your dataset has. The only way it can be a scalar is if A has only one feature. Is this your case? Do you have a single feature?

If you have a feature where the entry is a vector of dimension n the you can expanded into n features and do regression on those n features. For example, say you have a single feature that is given by two dimensional vectors [v1,u1], [v2,u2]...[vm,um]. Then you can consider instead of one feature two features, namely the first one by v1, v2 ..vm and the second to be u1,u2..um and do regression as usual. It does not mater that is two dimensional you can do the same with any number of dimension.

I hope I understood you problem correctly and that I have explained my self well too.

lunardog commented 4 years ago

I have multiple features. Your answer is mathematically sound, but my question was about TuriCreate as a framework, which, at the time I tried, only allowed for scalar outputs when doing regression. And thus @TobyRoseman 's answer that it's a good feature request.

Can you put your answer into TuriCreate code? I couldn't, so I used Keras and converted my model to CoreML after training.

carlosdelamora commented 4 years ago

Yes, I think the function on turicreate that you need is this one https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.unpack.html

lunardog commented 4 years ago

I'm afraid that even if I pack a vector of values into one column, TuriCreate expects a scalar. Note the error message from my original issue:

ToolkitError: Column type of target 'embedding' must be int or float.

TuriCreate does different things depending on int or float. If it's an int, it creates a multi-class classifier. If float, it creates a single-scalar-output regression model. If I pass anything else, it gets rejected here.