IDEMSInternational / carbonr

Calculating carbon emissions in R
GNU Lesser General Public License v3.0
14 stars 2 forks source link

Create function that reformats data into correct form for calculate_emissions function #14

Open lilyclements opened 2 years ago

lilyclements commented 2 years ago

Want an automated system to create a data set fit for use to calculate all emissions from.

This data could come from some expense report data. Big issue can be in the nuances when others write in their expenses, or in different software. This is only based on Zoho so far.

  1. Take the description column and recategorise as a flight, train, etc (see example below)
  2. Failing the description column, take the category_name column to recategorise
  3. If it cannot be automatically categorised, run checks (see example below)
  4. Take the description column and add in other details. E.g., flight from, to, via, number on flight, etc. How can we do this? Automatically take the "description" column and read it in, or run checks? If the description says "return flight", then can we assume it is a return? What if the description says "non-return flight"? If the description says "for me and X", how do we read that? If the description says "for two nights", do we check for words around "nights" Translating qualitative data to be quantitative - requires thought.
  5. Relabel to correct format; for example, "London Heathrow" to "LHR" for airport data (similarly for ferry and train emissions)
  6. For vehicle distances, mileage is given in mileage_distance variable in Zoho. To what extent to depend on this? If it is empty, do we have another check?
  7. Office emissions - how to offer this. Read in a data set with employees number of days/hours

E.g., with expense_report data to 1.

plane_match <- c("flight", "plane", "airport", "airplane", "aeroplane")
hotel_match <- c("hotel", "accomodation", "nights", "night", "stay", "guesthouse", "airbnb")
taxi_match <- c("taxi", "cab")

expense_report <- expense_report %>%
  dplyr::mutate(emission =
                  ifelse(grepl("train", description, ignore.case = TRUE), "Train",
                         ifelse(grepl(paste(plane_match, collapse= "|"), description, ignore.case = TRUE), "Flight",
                                ifelse(grepl(paste(hotel_match, collapse= "|"), description, ignore.case = TRUE), "Hotel",
                                       ifelse(grepl(paste(taxi_match, collapse= "|"), description, ignore.case = TRUE), "Taxi", 99)))))

E.g. to 3.

na_emission <- which(expense_report$emission == 99)
which_emission <- function(description_var = expense_report$description[i]){
  input <- menu(c("Accomodation", "Materials", "Office", "Transport", "None of the above"),
                title=paste("What do you want to assign to", description_var))
  if (input == 1){
    expense_report$emission[i] <- "Hotel"
  } else if (input == 2){
    expense_report$emission[i] <- "Materials"
  } else if (input == 3){
    expense_report$emission[i] <- "Office"
  } else if (input == 4){
    input2 <- menu(c("Ferry", "Flight", "Train", "Vehicle", "None of the above"),
                   title=paste("Which transport type?"))
    if (input2 == 1){
      expense_report$emission[i] <- "Ferry"
    } else if (input2 == 2){
      expense_report$emission[i] <- "Flight"
    } else if (input2 == 3){
      expense_report$emission[i] <- "Train"
    } else if (input2 == 4){
      input3 <- menu(c("Bus", "Car", "Coach", "Taxi", "Tube", "None of the above"),
                     title=paste("Which vehicle type?"))
      if (input3 == 1){
        expense_report$emission[i] <- "Bus"
      } else if (input3 == 2){
        expense_report$emission[i] <- "Car"
      } else if (input3 == 3){
        expense_report$emission[i] <- "Coach"
      } else if (input3 == 4){
        expense_report$emission[i] <- "Taxi"
      } else if (input3 == 5){
        expense_report$emission[i] <- "Tube"
      } else if (input3 == 6){
        expense_report$emission[i] <- NA
      }
    } else if (input2 == 5){
      expense_report$emission[i] <- NA
    }
  } else if (input == 5){
    expense_report$emission[i] <- "NA"
  }
}

for (i in na_emission){
  which_emission()
}
expense_report