datacarpentry / R-ecology-lesson

Data Analysis and Visualization in R for Ecologists
https://datacarpentry.org/R-ecology-lesson/
Other
314 stars 508 forks source link

Suggest csv editor in section on dates? #730

Closed ericward closed 3 years ago

ericward commented 3 years ago

Because of the date conversions excel makes, would it be useful to suggest an alternative csv editor in the section "Formatting dates" in Episode 2: Starting with data? This idea came up in instructor training and I agree it would be helpful to suggest a few alternatives. I personally am looking for a good windows one, so I would have really appreciated the suggestion myself.

Teebusch commented 3 years ago

Hi @ericward, at the moment the lesson seems to be "csv editor agnostic". As far as I know, Excel is not being mentioned or recommended. Are you suggesting to warn learners not to use Excel?

What would be an alternative to Excel that you would like to recommend? OpenRefine, maybe? The Carpentries has a workshop for it. For a pure R solution, there are packages like DataEditR. For simple stuff I often use a text editor.

However, I'm not sure if we should recommend to edit CSV files by hand at all. The problem with editing csv files by hand is that you sacrifice data provenance and reproducibility. Thus, it might be preferable to edit/clean the csv with code. Excel certainly has some issues, but it can be a quite powerful tool and I don't see a reason to discourage people from using it altogether.

ericward commented 3 years ago

Seems reasonable to me. I can see why being 'agnostic' and not opening a new can of worms is desirable.

It just occurs to me that: (1) people new to coding are often using Excel to edit and even analyze data, (2) If someone is using the Excel date functions, it may not have separate integer columns for date, month, year, (3) files opened and resaved in Excel will often have the date column changed by the mere fact that they were saved in Excel. People who have not used coding languages before are sometimes unaware of these issues.

The lesson says "As a reminder from earlier in this lesson, the best practice for dealing with date is to ensure that each component of your date is stored as a separate variable. Using str(), We can confirm that our data frame has a separate column for day, month, and year, and that each contains integer values."

I'm not sure that suggestion is made actually earlier in the lesson? I might be missing it. Rather than the 'what if?' of 31 days of data in September or April, some time spent on date formats common encountered (and possible issues introduced by programs like Excel) might be more useful. I'm pretty sure there are other Carpentries lessons that cover this, so maybe a brief mention and a link would suffice.

Also, I just noticed a couple of typos in this part of the lesson that I will post separately.

Teebusch commented 3 years ago

Regarding Excel: I really think it is a whole new can of worms and I don't feel the need to discuss Excel and it's flaws in this lesson. Likewise with your suggestion to discuss common date formats and issues. At this point in the lesson, the learners are still getting familiar with the fundamentals of programming and working with data through code. I think too much detail will be overwhelming. So I will politely reject your suggestion. However, feel free to draft something up, ideally in the form of a pull request. I think that would make it easier to see whether this could fit or not.

The other thing:

The lesson says "As a reminder from earlier in this lesson, the best practice for dealing with date is to ensure that each component of your date is stored as a separate variable. Using str(), We can confirm that our data frame has a separate column for day, month, and year, and that each contains integer values."

I'm not sure that suggestion is made actually earlier in the lesson?

Well spotted! I also can't find this best practice mentioned earlier in this lesson. Personally, I also don't agree with this suggestion - If dates are formatted consistently, then {lubridate} and similar libraries can handle them just fine.

So, I would suggest to remove this recommendation altogether.

Teebusch commented 3 years ago

(this issue is related to #678)