apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Support short sessions in Activity Classifier #813

Closed vmalyi closed 5 years ago

vmalyi commented 6 years ago

Hi everyone,

after reading the introduction to Activity Classifier I can't understand the reason for using sessions both in data for activity classifier and during the process of creating/training of Activity Classifier.

The following official explanation is clear to me: Each set of consecutive samples produced from a single recording of a subject is called a session. A session can contain demonstrations of multiple activities. However, sessions aren't required to contain all activities or be of the same length. The input data to the activity classifier must contain a column to uniquely assign each sample to a session...

However, I don't understand the reason why Activity Classifier need to strictly associate samples with specific sessions?

PS: this question arose after trying to use Activity Classifier for my multi-class human activity recognition problem. Unfortunately, the performance of Activity Classifier available in Turi Create was way lower than the one I already utilize in my project and I'm trying to identify potential issues which could affect the accuracy of Turi's Activity Classifier like the usage of sessions which I mentioned above.

Thanks!

igiloh commented 6 years ago

Hi @vmalyi,

Thank you for trying out the Activity Classifier, and for the elaborate feedback!

As you can imagine - the session id is indeed stripped in the pre-processing stage. Before training the DNN - the sessions are being cropped into non-overlapping, constant length time sequences. These constant length sequences (let's denote them "chunks") are the actual training samples being fed to the network.

The main motivation behind the session id requirement is to make sure that samples from two separate users don't get into the same chunk (instead - the last chunk of each sessions is being padded and then weighted out by the loss function).
Another advantage of using sessions is it provides a single identification for consecutive data from a specific user. If we would have used time stamps only - data taken at the same time from two different users might have gotten intermixed. Or alternatively by using user_id only - two different recordings, taken at completely different times, might have been fused together to the same chunk.
Lastly - using sessions improves the train-test split mechanism. Splitting whole sessions increases the test set's independence, by making sure that correlated chunks coming from the same user don't end up in both the training and the test set.

Can you please share some more information about your human activity recognition problem, and the algorithms you've been comparing with?
The AC have been tested with several similar datasets (multi-class human activities, such as running, walking etc.). We have tried to design the AC to generalize well for a wide variety of datasets.
Any additional information would be of course welcome - as we try to improve AC performance even further.

vmalyi commented 6 years ago

hi @igiloh and thanks for such a broad answer!

Now the logic behind using session IDs is clear to me.

I work on http://autoworkout.app/ which currently recognizes 11 activities using 3 deep feedforward NN s each fed with accelerometer data from X, Y, Z axis respectively and each having accuracy of ~0.91% +- 0.05% and F1-score of 0.92 +- 0.01. All 3 NNs work in an ensemble and there is a custom algorithm which makes a final prediction. Predictions are made each ~1.8 seconds.

And even with the accuracy stated above I still feel a need for improving it, because it's critical from the user's perspective. I gave Turi Create a quick try and was able to reach only 45-51% accuracy (trained on 1120 sessions) even after extensively searching for optimal hyperparameters. The data Turi's activity classifier was trained on was the same, AutoWorkout's one had been trained on.

And here is a small excerpt of training data from a single recording session, containing a single activity:

training_data

While looking for the ways of improving Turi's classifier performance, I thought that probably the data format could be an issue in my case: the whole dataset consists of recording sessions 10 seconds long, containing a single actor and a single activity each. But according to your explanation such a format is rather expected from Turi, so there should not be an issue.

Another proble, which is coming to my mind is the absence of gyroscope samples in training data (in case of AutoWorkout classifier, gyroscope data didn't contribute a lot into the accuracy).

Is there anything else I should take a look to to improve Turi's classifier performance?

Thanks!

igiloh commented 6 years ago

Hi @vmalyi ,

Thanks for the elaborate reply!

It is indeed peculiar that you see such a big difference in performance (91% vs 45%). Sounds like something is off. I would indeed suspect some incorrect usage of the AC.
I do have a few ideas from the top of my head (will elaborate soon) - but it would be best to just try to investigate with the actual data.

Would it be possible for you to share a subset of your data with us in a secure manner? And also the code you have used to train it with the AC?

Thanks!

vmalyi commented 6 years ago

@igiloh, I invited you as a collaborator to the repo containing the *.sframe I used to train the classifier on and Jupyter notebook containing the classifier itself.

Let me know your thoughts on what could possibly go wrong with my approach.

igiloh commented 6 years ago

Thank you very much!

We will analyze the dataset and get back to you soon.

vmalyi commented 6 years ago

hey @igiloh,

did you have a chance to look into this?

igiloh commented 6 years ago

Hi @vmalyi - Yeah, I'm just summarizing my experiments and pushing an updated notebook to your repo.
May I publish it here as well, for future reference?

igiloh commented 6 years ago

@vmalyi - would you mind sharing with me the timestamps column for the data in your SFrame?

From the screenshot you've attached - it seems that the intended sampling rate was 20Hz (i.e. 50 msec between samples).
However - there seem to be a big variation in the sampling rate. Some of the samples seem to be 10 msec apart, while others are 80 msec.
I'm not sure whether this is a real problem. However - as the AC was designed for sensor readings at a constant rate - I wonder if that affects the performance as well. I would like to try and interpolate the data at a constant rate of 20Hz - and see if the performance is a bit better.

vmalyi commented 6 years ago

@igiloh, you may publish here notebook contents but training data must not be included/published in any form.

Thanks!

igiloh commented 6 years ago

Hi @vmalyi,

Please see attached a notebook with my analyses.

I'm afraid that my main conclusion is that I couldn't find any set-up in which the AC performs significantly better on this dataset.
The main reason, I believe, are the short sessions. Having short sessions of 10 sec each, with a prediction window of 2 seconds - means that every session only has about 5 windows on it.
Since the RNN in the AC is learning across these windows - having such short sequences (5 samples long) makes it hard for the RNN to gain much knowledge.

I'm leaving this issue open, and changing it to a feature request.
We would definitely take this use case in consideration, in future development of the Activity Classifier. There is no reason why datasets like yours shouldn't be supported.

znation commented 6 years ago

@igiloh Can you please re-title this issue to reflect the feature being requested? Thanks!

vmalyi commented 6 years ago

@igiloh, thanks for your analysis!

@znation I've renamed the issues to sound more appropriate.

znation commented 6 years ago

Thanks @vmalyi!

tetreault commented 6 years ago

Total machine learning noob here but came across Turi because I have a task that requires somewhat reliably going from standing-to-kneeling posture (and ideally vice versa). I was following along with the documentation, and made sure my preprocessed CSV mirrors the format used in the table in the "advanced usage" section.

I'm running the following python script to generate a model:

import turicreate as tc
dataSFrame = tc.SFrame('motion_data_all.csv')

# Train/test split by recording sessions
train, test = tc.activity_classifier.util.random_split_by_session(dataSFrame, session_id='exp_id', fraction=0.8)

# Create an activity classifier
motion_model = tc.activity_classifier.create(train, session_id='exp_id', target='motion', prediction_window=50)

metrics = motion_model.evaluate(test)
print(metrics['accuracy'])

# Save the model for later use in Turi Create
motion_model.save('motion_model.model')

# Export for use in Core ML
motion_model.export_coreml('MotionActivityClassifier.mlmodel')

# try out the model 
model_test_data = dataSFrame[(dataSFrame['motion'] == 'sit') & (dataSFrame['exp_id'] == 1)][500:1000]
motion_model.predict(model_test_data, output_frequency='per_window')

My training data is extremely small, as far as ML training data goes. Approx 3.5k rows of standing/sitting/kneeling, each accurately labeled. In the doc you use the HAPT data and then process it into a format with exp_id and the activity label. My data set mirrors that end result.

When I run the following line in spyder:

metrics = motion_model.evaluate(test)
print(metrics['accuracy'])

I get the following error: ToolkitError: Size of prediction probability vector(3) != number of classes(1).

Just to be clear I'm not completely messing up from the start, I've included some screenshots from the file I use with the following: dataSFrame = tc.SFrame('motion_data_all.csv').

screenshot 2018-08-11 18 24 19 screenshot 2018-08-11 18 24 34

When I run the line: motion_model = tc.activity_classifier.create(train, session_id='exp_id', target='motion', prediction_window=50) I get the following output:

screenshot 2018-08-11 18 28 13

Every project is unique so I don't expect anyone to debug this for me but any advice pointing me on what to read to learn how to resolve/understand that following error (ToolkitError: Size of prediction probability vector(3) != number of classes(1).) would be great.

vmalyi commented 6 years ago

hey @tetreault, want to probably move your post to a separate thread?

igiloh commented 6 years ago

Thanks @vmalyi, I was just about to suggest the same.
@tetreault You seem to be experiencing a different problem than the one @vmalyi have reported. I would suggest starting your own thread so we can better track this issue.

tetreault commented 6 years ago

made a new thread here: https://github.com/apple/turicreate/issues/964

srikris commented 5 years ago

@vmalyi After reproducing the issue, we have verified that a simple augmentation fix (using a rolling window instead of a tumbling window) significantly increases the performance. Until this makes into the next release, you can use this code to work around and improve the accuracy.

sessions = train['session_id'].unique()

# Now create a simple rolling window augmentation
train_augmented = train[train['session_id'] == sessions[0]]

for s in sessions[1:]:
    session_data = train[train['session_id'] == s]
    for i in range(1, prediction_window, overlapping_window_size):
        augmented_session = session_data[I:]
        augmented_session['session_id'] = augmented_session['session_id'] + 100000 * i
        train_augmented = train_augmented(augmented_session)

Now you can train on the augmented data and the accuracy is 30% better than the original data on this particular dataset.

shreyajain17 commented 5 years ago

On further investigation, we have found that the dataset provided was sorted by labels and hence, the sessions were split up across the multiple labels. While the activity classifier in Turi Create expects the samples associated with each session id to be in an ascending order by time. Do you by any chance have data which has the session id preserved in order?

shreyajain17 commented 5 years ago

The data that was provided to open this issue showed that sessions are short because it was sorted by label. Each session was basically spread throughout the data. This made it look like the sessions are very short. Closing the issue for now since we do not have not come across any dataset in the right format with this issue yet.