buschbirk / intro-data-capstone-musclehub

0 stars 0 forks source link

Avoid using apply on the entire dataframe when it isn't necessary #6

Closed hillarygreenlerman closed 6 years ago

hillarygreenlerman commented 6 years ago

https://github.com/buschbirk/intro-data-capstone-musclehub/blob/66253a292dbe1c2e164ec5b57ff40c1a6da26ecf/Final%20Analysis/musclehub.py#L164

What you did here totally works and produces the correct output.

However, because you only reference one column in this lambda function (fitness_test_date), you can use apply on a single column using this syntax:

df['ab_test_group'] = df.fitness_test_date.apply(lambda x:
                                                 'A' if pd.notnull(x) else 'B')

Note that apply comes after fitness_test_date and that we don't need the keyword axis=1.

This is more computationally efficient; you'll notice a big time difference when working with larger datasets, especially if the dataframe has a lot of columns.

buschbirk commented 6 years ago

Good point. Thanks!