geco-bern / agds

Applied Geodata Science book. Developed for the lecture(s) with the same name at the Institute of Geography, University of Bern.
https://geco-bern.github.io/agds/
Other
5 stars 6 forks source link

Problems reading `.csv` files from Excel #122

Open pepaaran opened 1 year ago

pepaaran commented 1 year ago

Some students saved the file from the Exercises of Ch 3 into an Excel file and then .csv. When they did that, they saved with ; separated values and needed to use the read_csv2() function (read_csv only recognises , separated values). Include this information somewhere in the tutorial.

khufkens commented 1 year ago

This is up next week in my lessons. Check some of the examples I put in. Good to flag this as this means that this is timely. There is an argument here for moving all this to the front, as well as the notions of file structures.

However, we decided against this as this is arguably rather dull and all - but key as it seems.

pepaaran commented 1 year ago

Maybe we can add some troubleshooting information in the exercise for data wrangling, but explain it more thoroughly in your data variety chapter. That would push the "boring" part to a later class.

khufkens commented 1 year ago

Ideally we should teach them to not do this manually at all! Technically you can clean this file without touching the original (and the real danger of introducing untraceable errors on input-output).

# read in the data sheet S1
# skip 3 first rows
data <- readxl::read_xlsx(
  "1249534s1-s6.xlsx",
  sheet = "Database S1",
  skip = 3
  )

# drop any rows which don't have a complete
# citation (spacer rows)
data <- data |>
  tidyr::drop_na(Citation)

# carry forward the labels in "Experiment"
data <- data |>
  tidyr::fill(
    Experiment,
    .direction = "down" # state the fill direction explicitly
    )

# all cleanup follows from here
khufkens commented 1 year ago

Both drop_na and fill live in {tidyr}

https://tidyr.tidyverse.org/reference/fill.html