VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

Classification Multivariate time series : values both categorical and continues #4628

Open Sandy4321 opened 1 year ago

Sandy4321 commented 1 year ago

can not find

Short description

Classification Multivariate time series : values both categorical and continues

features color | weight | gender |height |age | target data samples examples sequence 1 target is YES time 1| black 56 m 160 34 time 2| white 77 f 170 54 time 3| yellow 87 m 167 43 time 4| white 55 m 198 72 time 5| white 88 f 176 32 YES

sequence 2 target is NO time 1| yellow 86 f 120 14 time 2| white 27 m 150 44 time 3| yellow 58 f 156 22 NO

sequence 3 target is YES time 1| yellow 8 f 177 23 time 2| white 15 m 138 82 time 3| yellow 8 f 177 23 time 4| white 15 m 138 82 YES

etc... for example , can it be done by https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/examples/search_sequence.html#

How this suggestion will help you/others

seems to be there are no such a capacities in web , but such a task is important to have solution for practice then you will be the first

Possible solution/implementation details

do not know, do you know github repos with code to do this ?

Example/links if any

none

cheng-tan commented 1 year ago

You can try search sequence. If the sequence is short, then add a namespace for each timestep and a long context to predict the binary label. For weight, height, age you can experiment different encodings and break them into groups like the example in this post: https://stackoverflow.com/questions/28640837/vowpal-wabbit-how-to-represent-categorical-features

Sandy4321 commented 1 year ago

great thanks do you have code example with some data as input pls

cheng-tan commented 1 year ago

Since the sequence is not long, maybe try classification first before sequence search? Each sequence will be one example for VW.

An example input would be(input format wiki):

1 |a time:time_1 black weight_bucket_1 m height_bucket_3 age_bucket_3 |b time:time_2 white weight_bucket_2 f height_bucket_3 age_bucket_5 .... 0 |a time:time_1 yellow weight_bucket_3 f height_bucket_1 age_bucket_1....

Sandy4321 commented 1 year ago

it is only example to develop mutual language in real live sequences ar long also each sequence has different number of time stamps

sequence 1 -> 5 time stamps sequence 2 -> 3 time stamps sequence 3 -> 4 time stamps

JohnLangford commented 1 year ago

The easiest reasonable solution seems to be creating a custom learning-to-search application. To do this, you would modify and/or create a new task (like those here: https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/vowpalwabbit/core/src/reductions/search/search_sequencetask.cc or in the parent directory). You would want to have a potentially large number of dummy labels associated with the nonterminal in order to allow for the transmission of information about the sequence to the terminal, and then you would have only the prediction at the end affect the declared loss.

Alternatively, (and with more work), you could implement an RNN and work off of that.