MichiganDataScienceTeam / googleanalytics

MDST Project Fall 2018
7 stars 7 forks source link

Shallow copy before adding columns in exploration #55

Closed jonathancstroud closed 6 years ago

jonathancstroud commented 6 years ago

https://github.com/MichiganDataScienceTeam/googleanalytics/blob/fb6650b0bfdfa5e536af2c6db30c9c1a9f6e1cd1/explore_utils.py#L40

How embarrassing - I didn't even follow my own instructions. The column revenue should be added to a copy of data.train, not the dataframe itself. This makes revenue appear in the dataframe after calling.

To recreate:

print('Revenue before?', 'revenue' in data.train.columns)
percentile_values = explore_utils.find_customer_revenue_percentiles(
        data,
        percentiles)
print('Revenue after?', 'revenue' in data.train.columns)

Output:

Revenue before? False
Revenue after? True