Description of changes: Original version uses complicated approach to find the max number of cycles for each id. Using pd.DataFrame.transform with pd.Groupby, we can find the max value for each id and assign it to the the proper column. This prevents making extra copies of the DataFrame and then merging those slices.
Original:
for i, df in enumerate(train_df):
rul = pd.DataFrame(df.groupby('id')['cycle'].max()).reset_index()
rul.columns = ['id', 'max']
df = df.merge(rul, on=['id'], how='left')
df['RUL'] = df['max'] - df['cycle']
df.drop('max', axis=1, inplace=True)
train_df[i]=df
This code could be further simplified by using the "names" argument to assign the labels to the columns. I didn't make this change because the way the columns list is used for the test datasets causes issues. However, the process for reading in the data for the test data is also needlessly complex.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes: Original version uses complicated approach to find the max number of cycles for each id. Using pd.DataFrame.transform with pd.Groupby, we can find the max value for each id and assign it to the the proper column. This prevents making extra copies of the DataFrame and then merging those slices.
Original:
revised:
This code could be further simplified by using the "names" argument to assign the labels to the columns. I didn't make this change because the way the columns list is used for the test datasets causes issues. However, the process for reading in the data for the test data is also needlessly complex.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.