awslabs / predictive-maintenance-using-machine-learning

Set up end-to-end demo architecture for predictive maintenance issues with Machine Learning using Amazon SageMaker
Apache License 2.0
102 stars 59 forks source link

simplified how RUL is calculated using transform method with GroupBy #8

Open kylejones200 opened 4 years ago

kylejones200 commented 4 years ago

Issue #, if available:

Description of changes: Original version uses complicated approach to find the max number of cycles for each id. Using pd.DataFrame.transform with pd.Groupby, we can find the max value for each id and assign it to the the proper column. This prevents making extra copies of the DataFrame and then merging those slices.

Original:

for i, df in enumerate(train_df):
    rul = pd.DataFrame(df.groupby('id')['cycle'].max()).reset_index()
    rul.columns = ['id', 'max']
    df = df.merge(rul, on=['id'], how='left')
    df['RUL'] = df['max'] - df['cycle']
    df.drop('max', axis=1, inplace=True)
    train_df[i]=df

revised:

df['max'] = df.groupby(['id'])['cycle'].transform(max)
df['RUL'] = df['max'] - df['cycle']

This code could be further simplified by using the "names" argument to assign the labels to the columns. I didn't make this change because the way the columns list is used for the test datasets causes issues. However, the process for reading in the data for the test data is also needlessly complex.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.