Closed dimstudio closed 4 years ago
Can we save the scaler
and make sure to use that also for the future instances of chest compression?
Can we save the scaler and make sure to use that also for the future instances of chest compression?
Good idea. I just added saving the scaler after fitting it. When needed it can be loaded again with scaler = joblib.load('models/scaler.pkl')
Thanks, added the scaler also at the Tensorflow implementation. Editing like this, to avoid overwriting:
dataset = folder.split("/")[1]
joblib.dump(scaler, "models/scaler_" + dataset + ".pkl")
Added an example in #13
This does not work properly yet, because I don't exactly know how it works with the TCP.
Thanks that's cool. I am not able to simulate the TCP connection yet.
You should consider the INPUT data as one session file (zip file) having only ONE sample, so 1 chest compression.
The expected output that has to be returned is a dictionary with target Classes and the classification result
{
"classRate": 2,
"classDepth": 0,
"classRelease": 1
}
Some questions:
I am only working on the PyTorch version, because PyTorch's documentation and also support is way better than Tensorflow's. Also I prefer the coding style of PyTorch since it is more Pythonic (Tensorflow 2.0 is trying to copy PyTorch by now, but it still is worse).
Performance wise they are basically the same. PyTorch is more used in Research areas right now and gets increasingly more popular which helps in troubleshooting. The PyTorch developers are also quite active on the forums which helps in finding solutions to your (mostly quite specific) problems.
You should consider the INPUT data as one session file (zip file) having only ONE sample, so 1 chest compression.
As far as I understand I do that.
The expected output that has to be returned is a dictionary with target Classes and the classification result
{ "classRate": 2, "classDepth": 0, "classRelease": 1 }
Alright, but then it is less general. We could add a rounding option after getting the model's prediction.
I am only working on the PyTorch version, because PyTorch's documentation and also support is way better than Tensorflow's. Also I prefer the coding style of PyTorch since it is more Pythonic (Tensorflow 2.0 is trying to copy PyTorch by now, but it still is worse).
Performance wise they are basically the same. PyTorch is more used in Research areas right now and gets increasingly more popular which helps in troubleshooting. The PyTorch developers are also quite active on the forums which helps in finding solutions to your (mostly quite specific) problems.
Alright, but I was asking what performances you get on the TableTennis and the CPR_experiment dataset with PyTorch and TF. From a first TF got higher prediction accuracy on these datasets.
Alright, but then it is less general. We could add a rounding option after getting the model's prediction.
Why less general? because it writes the target_classes
or because it gives answers without confidence?
Added an example in #13
This does not work properly yet, because I don't exactly know how it works with the TCP.
In any case if I manage I try this afternoon it works with PyTorch!
Alright, but I was asking what performances you get on the TableTennis and the CPR_experiment dataset with PyTorch and TF. From a first TF got higher prediction accuracy on these datasets.
Aah, that's what you meant. Sorry, I misunderstood.
I got the same performances as you posted in #9
Why less general?
Because we would need to hardcode the class names in the dictionary.
In any case if I manage I try this afternoon it works with PyTorch!
Do that! It's really straight forward!
Because we would need to hardcode the class names in the dictionary.
But those are already hardcoded in target_classes = ["classRate", "classDepth", "classRelease"]
You are right! Fixed it
I am rewriting main.py
which implements the TCP server. We need to change online_classification
function accordingly and check which parameters we need
The function in main.py
works more or less like this
def handle_client_connection(client_socket, port):
request = client_socket.recv(10000000)
json_string = json.loads(request, encoding='ascii') #here the chest compression enconded in json
result_dict = online_classification(json_string) #here we need to change this function with the right parameters
client_socket.send(result_dict.encode()) #this is the server reply with the results
client_socket.close()
Please note:
in this case the input data will be in json format, I also wrote a json_to_df
function:
# load the json parsed data
def json_to_df(data):
df = pd.concat([pd.DataFrame(data),
json_normalize(data['Frames'])],
axis=1).drop('Frames', 1)
df.columns = df.columns.str.replace("_", "")
if not df.empty:
df['frameStamp'] = pd.to_timedelta(df['frameStamp']) # + start_script
df.columns = df.columns.str.replace("frameAttributes", df["ApplicationName"].all())
df = df.set_index('frameStamp').iloc[:, 2:]
df = df[~df.index.duplicated(keep='first')]
df = df.apply(lambda x: pd.to_numeric(x, errors='ignore'))
df = df.select_dtypes(include=['float64', 'int64'])
df = df.loc[:, (df.sum(axis=0) != 0)]
# KINECT fix
df.rename(columns=lambda x: re.sub('KinectReader.\d', 'KinectReader.', x), inplace=True)
df.rename(columns=lambda x: re.sub('Kinect.\d', 'Kinect.', x), inplace=True)
# Exclude irrelevant attributes
for el in to_exclude:
df = df[[col for col in df.columns if el not in col]]
df = df.apply(pd.to_numeric).fillna(method='bfill')
else:
print('Empty data frame. Did you wear Myo?')
return df
Thanks, added the scaler also at the Tensorflow implementation. Editing like this, to avoid overwriting:
dataset = folder.split("/")[1]
joblib.dump(scaler, "models/scaler_" + dataset + ".pkl")
Can you please do the same in model_training_pytorch.py
?
I prefer to add _scaler
to the model path instead of putting it at the front. This keeps it more general (for example when a path to a model is in a subfolder) and keeps the scaler next to the belonging model. When you put scaler in front all the scalers will be grouped together.
Makes sense
Hey @HansBambel I need to make the online classification work by the end of this week. Do you think you can give a look at how to change the online_classification
function so that it can take one learning sample as input? thank you
@dimstudio I'll look into it!
Done in PR #16
Online classification now takes a path to the trained model and an input sample. Maybe this way is very slow because for every sample pytorch is started. Maybe the loop should be done in online_classification()
when the model is loaded.
I was assuming that the TCP-server gives me a correct input. I do not know how you do this though...
Another thing I have seen in main.py
is that the function tensor_transform
is there. Is this the same as in data_helper.py
? It looks smaller.
Done in PR #16
Terrific thanks!
Another thing I have seen in
main.py
is that the functiontensor_transform
is there. Is this the same as indata_helper.py
? It looks smaller.
It should be the same transformation yes. It looks smaller as it operates in just one interval, so it does not need to cut into intervals and do the preprocessing as the entire dataset.
Btw I am testing it right now I need to adapt it a bit. I keep you posted!
I have a problem with loading the model in the online_classification
function in main.py
The torch.load gives me a dictionary with state_dict
value. I have to initialize the model variable to something, this was my attempt but I cannot initialize MyLSTM without arguments.
model = model_training_pytorch.MyLSTM()
loaded = torch.load(f'{path_to_model}.pt')
model = model.load_state_dict(loaded['state_dict'])
model.eval()
This is what I have for loading there:
loaded = torch.load(f'{path_to_model}.pt')
model = loaded['model']
model.load_state_dict(loaded['state_dict'])
model.eval()
This is what I have there. When you train with model_training_pytorch.py
he saves the model with the classes and the parameters as well (at least he should) so that you don't need to worry about that when loading again:
torch.save(dict(model=model, state_dict=model.state_dict()), f'{save_model_to}.pt')
I solved that issue, I had to retrain with the latest code. I am going to upload an example of an online sample to classify
You can have a look at main.py
which works with example_request.txt
.
Did I fail to convert it into a tensor?
The batch
variable in process_data()
is a dataframe and it contains a lot of NaNs. I think there is something wrong.
The model expects a three-dimensional tensor. The first one being the batch-size. So in our case only 1. Making the input shape something like 1x79x52. (At training time it is something like 64x17x52)
What about now? I have 17x52 of dimension
np.stack didnt work. I substituted it (in #17 ) with expand_dims. But there is still the issue with NaNs
Are you sure? I do not have any empty value in the batch
Before resampling: (143, 52)
Shape of the interval is (17, 52)
Shape of the batch is (1, 17, 52)
Batch is containing nulls? False
Traceback (most recent call last):
File "C:/Users/Daniele-WIN10/Documents/GitHub/SharpFlow/main.py", line 184, in <module>
exampleData()
File "C:/Users/Daniele-WIN10/Documents/GitHub/SharpFlow/main.py", line 135, in exampleData
return_dict = process_data()
File "C:/Users/Daniele-WIN10/Documents/GitHub/SharpFlow/main.py", line 159, in process_data
result = online_classification("models/lstm",batch)
File "C:/Users/Daniele-WIN10/Documents/GitHub/SharpFlow/main.py", line 171, in online_classification
scaled_data = scaler.transform(input_sample)
File "C:\Users\Daniele-WIN10\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py", line 387, in transform
force_all_finite="allow-nan")
File "C:\Users\Daniele-WIN10\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 539, in check_array
% (array.ndim, estimator_name))
ValueError: Found array with dim 3. Estimator expected <= 2.
Process finished with exit code 1
To me, it looks like the problem it is more how the scaler is called
Yes, that is true. I put expand_dims now behind the scaling and force the data to be a tensor (in #18 ), but there is still an error from the NaNs.
So now he is only complaining about the actual input values.
Nice! it's working. Thanks @HansBambel !
Before resampling: (143, 52)
Shape of the interval is (17, 52)
Shape of the batch is (1, 17, 52)
{'classRelease': 1, 'classDepth': 1, 'classRate': 1, 'armsLocked': 1, 'bodyWeight': 1}
Process finished with exit code 0
In
main.py
single intervals (e.g. chest compressions or strokes) will be flowing-in one at a time via a TCP connection. This means that we will get smaller datafiles. We need to make sure the transformation on this data (in terms of rescaling, resampling, min-max normalization) is exactely the same as the one inmodel_training.py
since the processed intervals will be classified with the learned models.