Avoid confusing, potentially ambiguous commands for slicing/indexing data frames

caesoma commented 4 years ago

In episode 3 (https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html, actually listed as 4. in https://datacarpentry.org/python-ecology-lesson/ ), the distinction between .iloc method for accessing entries by position and .loc to access them by identifier is made, but a third possibility is shown with surveys_df[0:3], which accesses the indices by position.

That command is redundant with surveys_df.iloc[0:3] and is similar to accessing a column, i.e. df["column_name"], and can be mistaken for selecting a column if those are numbers. On top of that something using row and column positions like df[0:2,1] will raise an error.

While the command could be useful and best practices could avoid mistaking row/column identifiers, the lesson could instead say that df["col_name"] or df["list", "of", "col_names"] will access columns, while df.loc["index"] will access rows. That will keep position and identifier-based selection as separate commands for beginners.

# example
import pandas
from numpy.random import randint

arr = randint(0,10, [3,3]) 
df = pandas.DataFrame(arr)

df[0]  # selects first column
df[0:1]  # selects first row

maxim-belkin commented 4 years ago

Hi, @caesoma! Apologies for taking so long to respond.

Very good and valid point! I think the best solution would be to make learners aware of this in a form of an exercise or an additional material. Would you be willing to make this contribution to the episode?

caesoma commented 4 years ago

Hi, sure, I can do that. Let me know what format this exercise should be in.

maxim-belkin commented 4 years ago

Could you please draft a PR modifying existing and adding new text and/or exercise? we could then discuss the details such as format, etc. And please let me know if you need any help along the way.

caesoma commented 3 years ago

Sorry for the long delay as well. Finally got around to making the proposed changes.

LilithElina commented 1 year ago

I'm closing this issue as we worked through and accepted the relevant PR back in April.

datacarpentry / python-ecology-lesson

Avoid confusing, potentially ambiguous commands for slicing/indexing data frames #447