datacarpentry / python-ecology-lesson

Data Analysis and Visualization in Python for Ecologists
https://datacarpentry.org/python-ecology-lesson
Other
160 stars 309 forks source link

pd.unique(df['column_name']) vs df['column_name'].unique() #366

Closed rgaiacs closed 5 years ago

rgaiacs commented 5 years ago

https://datacarpentry.org/python-ecology-lesson/02-starting-with-data/index.html says

Let’s get a list of all the species. The pd.unique function tells us all of the unique values in the species_id column.

pd.unique(surveys_df['species_id'])

I think that use df['column_name'].unique() helps students to make more mental connections. For example, surveys_df is my data, surveys_df['species_id'] is one column of my data and in Python it is of Series type, df['column_name'].unique() is the unique values on the column.

Any reason to use pd.unique(surveys_df['species_id']) instead of df['column_name'].unique()?

wrightaprilm commented 5 years ago

If I'm being honest, I don't think Series objects had the unique attribute until a few months after the first draft of this lesson, and probably this just never got changed. I would be fine with a pull request to change this if you would, @maxim-belkin

maxim-belkin commented 5 years ago

This is in sync with what we do in swcarpentry/python-novice-inflammation: use module functions rather than object methods: https://github.com/swcarpentry/python-novice-inflammation/pull/244#issuecomment-201909951 I think it makes sense to use the same approach ( pd.unique(... ) in this lesson as well.

maxim-belkin commented 5 years ago

There is an interesting note, however:

Pandas makes heavy use of methods, so a lesson with Pandas should introduce methods. [@bsmith89]

:)

wrightaprilm commented 5 years ago

Edit: I misunderstood @maxim-belkin point. It sounds like keeping the two lesson consistent would mean keeping this as-is. I'm fine with this solution, as well.