datacarpentry / python-ecology-lesson

Data Analysis and Visualization in Python for Ecologists
https://datacarpentry.org/python-ecology-lesson
Other
160 stars 310 forks source link

Using display() command instead of print() in 02-starting-with-data and added assignment of data types #507

Closed wgiese closed 1 year ago

wgiese commented 2 years ago

(a) I would suggest to introduce the use of the display() function from the IPython package, which is preferred over print() for pandas. display() give essentially the same output as typing a pandas data frame name into the Python command prompt or at the end of a code block, but it is useful for longer code blocks. (b) I would suggest to explain that surveys_df.dtypes can be easily adjusted by one line of code, for instance for the 'sex' column, a categorical datatype might be preferred instead of 'object', or 'int16' instead of 'int64' for the columns 'month' or 'year'. I hope this can be useful, let me know what you think.

tobyhodges commented 1 year ago

I'm helping the current lesson Maintainers process outstanding pull requests on this repository, in preparation for transition to the new lesson infrastructure.

Thank you for suggesting these changes, @wgiese. For future reference when making multiple, unrelated suggestions such as these, it is better to open separate pull requests for each suggestion. That being said, I will respond to each one here:

a) While I agree that IPython.display.display is useful, I would prefer not to introduce another import into the lesson.

b) I like this suggestion very much, especially in relation to the conversion of sex to the more specific and appropriate categorical type. However, there already exists a later episode on data types and formats, which would be the more appropriate place for this content. I note that that episode does not mention the categorical datatype, and I would welcome a new pull request adding this, along with your example of how to simultaneously adjust the type of all the columns in the dataframe.

I am going to close this now, but I do encourage you to follow up with a new PR.

An important final comment: if you have the time and inclination to return to this lesson almost one year later and open a new pull request, please be aware that we will transition this lesson (and all others!) to the new infrastructure at the beginning of May 2023. Your new PR will need to be merged by then, or it will be invalidated and closed during the transition. So you may prefer to wait until the transition is done, then contribute to the updated version of the lesson. Either way, I would be delighted to help you make the change.