ave-63 / attendant.r

Compiles Zoom meeting attendance reports
0 stars 0 forks source link

Could not find object 'Duration.Minutes.' error? #2

Open gordsellar opened 3 years ago

gordsellar commented 3 years ago

I'm having an issue while trying to collate attendance records for my classes.

As predicted, the first time I ran do_attendance_mac.command R tried to install some packages and failed. I'd just installed R from the local (Korean) mirror, so the feedback I get is in Korean (oops). Therefore I'm plugging it into an autotranslator to see what the errors reported actually say.

I'm not sure if I managed to install everything missing, as I found a command for bypassing the problem—but may have only done so partially?

I'm giving you the (Google-Translated) report I get when I run do_attendance_mac.command. I wonder if it's a similar issue to the previous issue, some kind of setting or something?

Attach the following package: ‘dplyr’

The following objects are masked from ‘package:stats’:

     filter, lag

The following objects are masked from ‘package:base’:

     intersect, setdiff, setequal, union

[1] creating output file attendance_aeiv10.csv
eval(predvars, data, env) gives the following error:
   Could not find object 'Duration.Minutes.'
Called: aggregate ... eval -> <Anonymous> -> model.frame.default -> eval -> eval
execution stopped
All done! Press [Enter] to quit. 

No output file is created, so it looks like something is just crapping out right at the start of the process. By the way, I have tried to install 'dplyr' and I think I have installed it. The package files at least get downloaded.

If you think it's a case of our Zoom output being unusually formatted, I could give you a Zoom attendance session file with the user content edited.

One thing: I uploaded the files to Google Drive before working with them. You don't think that has anything to do with it, do you? (It seems unlikely, but...)

Thanks for any advice you can offer.

gordsellar commented 3 years ago

I'm adding this, just in case it's pertinent: I looked inside the CSV files and the formatting looks odd to me. Here's a representative example of how the first two lines of my CSVs look (with the name and email address of the student swapped for fake ones, but the formatting preserved):

Name (Original Name),User Email,Total Duration (Minutes),Guest
'홍길동,honggildong@naver.com,135,Yes

When I say it looks weird, I mean that there's an opening quotation mark at the start of each name record, but it's never closed.

The other thing is that my students are pretty inconsistent in how they entered their names. They'll switch alphabets, include or omit a space, change spellings when using Roman letters, and so on. What is consistent from csv to csv their email address, because I required them to use their campus email for classes.

If you have any advice about how I could adjust things do that the duration times are sorted by email address instead of by name, that would be very helpful to me. Thanks!

ave-63 commented 3 years ago

Hey @gordsellar, this is similar to an issue had recently. For me, zoom changed it's csv format to have "User Name" instead of "Name" so I added this line, which I just pushed to github:

names(idf)[names(idf) == "User.Name"] <- "Name"

This changes the name of the column in idf (input data frame) from User.Name to Name which is what the rest of the program expects. For you, I recommend replacing this line with:

names(idf)[names(idf) == "User.Email"] <- "Name"
names(idf)[names(idf) == "Total.Duration..Minutes."] <- "Duration.Minutes." 

This should be right after idf <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i])) in the last block of code in attendant.r. The idea is that R replaces any characters like a space or '(' with a '.' to make them valid R variable names. To check that the script is inputting the file correctly and find the names of the columns you could add a line str(idf) here too.

Hopefully this works...

gordsellar commented 3 years ago

Hi there,

I'm afraid it's still choking on Duration.Minutes. for some reason, even after I make the change you suggest. This is the output I'm getting:

Attach the following package: ‘dplyr’

The following objects are masked from ‘package:stats’:

     filter, lag

The following objects are masked from ‘package:base’:

     intersect, setdiff, setequal, union

[1] creating output file attendance_aeiv10.csv
eval(predvars, data, env) gives the following error:
   Could not find object 'Duration.Minutes.'
Called: aggregate ... eval -> <Anonymous> -> model.frame.default -> eval -> eval
execution stopped
All done! Press [Enter] to quit.

One thing, I'm working from a copy of the file downloaded from your Dropbox.

(Oh, and by the way, the link in the Mac Installation instructions broke sometime last night: now it's a link to something else, a grader script or something? I was able to download the attendant files from the Windows installation instructions, however—that dropbox link works as expected.)

I assume I made the change in the right place:

args <- commandArgs(trailingOnly = TRUE)
INPUT_DIRECTORY <- args[1]
OUTPUT_DIRECTORY <- args[2]
YEAR <- args[3]

if(!require("stringi", quietly = TRUE, character.only = TRUE)){
    install.packages("stringi", character.only = TRUE)
}
library("stringi", quietly = TRUE, character.only = TRUE)
if(!require("dplyr", quietly = TRUE, character.only = TRUE)){
    install.packages("dplyr", character.only = TRUE)
}
library("dplyr", quietly = TRUE, character.only = TRUE)

options(stringsAsFactors=FALSE)

## Takes cols, a vector of column names, eg "Name" "We_2.10" "Mo_2.8" "Mo_2.15"
## and returns them in date order, eg "Name" "Mo_2.8" "We_2.10" "Mo_2.15"
sort_cols <- function(cols){
   mat <- stri_match_first_regex(cols, '\\w+_(\\d+)\\.(\\d+)') 
   cdf <- data.frame(colname = mat[,1], month = strtoi(mat[,2]), date = strtoi(mat[,3]))
   cdf <- filter(cdf, !is.na(colname)) ## "Name" created NA when matching regex
   cdf <- cdf[order(cdf$month, cdf$date),] ## Sort by month, then date
   append(c("Name"), cdf$colname) ## Put "Name" back
}

## opposite of make.names: make.headings("Mo_2.8") gives "Mo_2-8"
make.headings <- function(names){
    parts <- stri_match_first_regex(names, '(.+)\\.(.+)')
    paste0(parts[,2], "-", parts[,3])
}

## Parse file names
input_filenames = list.files(path = INPUT_DIRECTORY, pattern = ".+\\.csv")
output_filenames = list.files(path = OUTPUT_DIRECTORY, pattern = ".+\\.csv")
input_matches <- stri_match_first_regex(input_filenames,
   '(\\w+)_att_(\\d+-\\d+)\\.csv$')
output_matches <- stri_match_first_regex(output_filenames, 'attendance_(\\w+)\\.csv$')
if(length(input_filenames) == 0){
    stop(paste("Error: there are no csv files in INPUT_DIRECTORY:", INPUT_DIRECTORY))
}
for(i in 1:length(input_filenames)){
    if(is.na(input_matches[i,1])){
        print(paste("The following file will not be treated as input, as it does not match attendance input file syntax:", input_filenames[[i]]), quote = FALSE)
    }
}

## Populate inputs dataframe with input_matches
file_name <- c()
course_id <- c()
md_date <- c()
day_of_week <- c()
std_date <- c()
col_name <- c()
for(i in 1:nrow(input_matches)){
    if(!is.na(input_matches[i,1])){
        file_name <- append(file_name, input_matches[i,1])
        course_id <- append(course_id, input_matches[i,2])
        md_date <- append(md_date, input_matches[i,3])
        std_date <- append(std_date, as.Date(paste0(YEAR, "-", input_matches[i,3])))
        day_of_week <- append(day_of_week, substring(weekdays(std_date[length(std_date)]), 1, 2))
        col_name <- append(col_name, make.names(paste0(day_of_week[length(day_of_week)],
                                                       "_",
                                                       md_date[length(md_date)])))
    }
}
inputs <- data.frame(file_name=file_name, course_id=course_id, md_date=md_date,
                     std_date=std_date, day_of_week=day_of_week, col_name=col_name)
if(length(inputs) == 0){
    stop(paste("Error: there are no attendance input csv files in INPUT_DIRECTORY:", INPUT_DIRECTORY))
}

## If output files don't exist, create them
duped = duplicated(inputs$course_id)
keepers <- rep(TRUE, nrow(inputs))
for(i in 1:nrow(inputs)){
    if(!duped[i]){
        output_exists <- FALSE
        if(nrow(output_matches > 0)){
            for(j in 1:nrow(output_matches)){
                if(!is.na(output_matches[j,2]) && inputs$course_id[i] == output_matches[j,2]){
                    output_exists <- TRUE
                    ## Remove rows from inputs that are already in output file w/ same page_range, dl_date
                    existing_days <- colnames(read.csv(paste0(OUTPUT_DIRECTORY, output_matches[j,1])))
                    for(k in 1:nrow(inputs)){
                        if(inputs$course_id[k] == output_matches[j,2] &&
                           inputs$col_name[k] %in% existing_days){
                            keepers[k] <- FALSE
                        }
                    }
                }
            }
        }
        if(!output_exists){
            new_file_name <- paste0("attendance_", inputs$course_id[i], ".csv")
            print(paste("creating output file", new_file_name), quote = FALSE)
            #file.create(new_file_name)
            df <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i]))
            ## df$Duration.Minutes. <- strtoi(df$Duration.Minutes.) ##unnecesary?
            df <- aggregate(Duration.Minutes. ~ Name, data = df, FUN = sum)
            new_df <- data.frame(Name = df$Name)
            write.table(new_df, file = paste0(OUTPUT_DIRECTORY, new_file_name), row.names = FALSE)
        }
    }
}
inputs <- inputs[keepers,] # keep a row iff keepers is TRUE
## TODO: check that this is functional

if(nrow(inputs) > 0){
    for(i in 1:nrow(inputs)){
        idf <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i]))
        names(idf)[names(idf) == "User.Email"] <- "Name"
        names(idf)[names(idf) == "Total.Duration..Minutes."] <- "Duration.Minutes." 
        idf <- aggregate(Duration.Minutes. ~ Name, data = idf, FUN = sum)
        idf[inputs$col_name[i]] <- idf$Duration.Minutes.
        idf <- select(idf, Name, inputs$col_name[i])
        output_file_name <- paste0(OUTPUT_DIRECTORY, "attendance_", inputs$course_id[i], ".csv")
        odf <- read.csv(output_file_name)
        odf <- merge(odf, idf, by = "Name", all = TRUE, sort = TRUE)
        odf <- odf[ , sort_cols(colnames(odf))] ## sort columns in date order
        headings <- c("Name", make.headings(colnames(odf))[2:length(colnames(odf))])
        write.table(odf, file = output_file_name, sep = ",",
                    row.names = FALSE, col.names = headings)
    }
}
gordsellar commented 3 years ago

Hm, I think I figured out PART of the problem—I was using reports downloaded from a different spot in Zoom. (There are a few places to download them.) However, the script still isn't working for some reason. I'm making do with pivot tables for now, but I am hopeful that I can figure this out sometime soon, so I can use it in future semesters.

Thanks for any help you can offer.

Ooops, and here is up updated output when I run it. Are there maybe uninstalled packages I need or something? Sorry, I've never used R before, complete newbie.

Attach the following package: ‘dplyr’

The following objects are masked from ‘package:stats’:

     filter, lag

The following objects are masked from ‘package:base’:

     intersect, setdiff, setequal, union

[1] creating output file attendance_AEIII20.csv
eval(predvars, data, env) gave the following error: could not find object 'Name'
Called: aggregate ... eval -> <Anonymous> -> model.frame.default -> eval -> eval
execution stopped
All done! Press [Enter] to quit.
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.

[Process completed]

But there is no output file. (And I've checked, the path in the config file is correct.)

ave-63 commented 3 years ago

So, it looks like you made the right changes. I'm guessing the problem is something else, like maybe the input files are not being found, or read correctly. Could you add the line str(idf) right under the changes you already made, and paste the output here?

BTW, I had forgotten to keep updating the version in my dropbox until today; so the version you are using probably has a few bugs and it's a good idea to download it again (and paste the changes you made into it again).

gordsellar commented 3 years ago

Okay, tried that and not much change. First, here's what the first line (headers) in the CSV I'm working with look like:

User Name,User Email,Join time,Leave time,Duration(Minutes),Attentiveness Score

This is the last block of code in attendant.r:

if(nrow(inputs) > 0){
    for(i in 1:nrow(inputs)){
        idf <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i]))
        names(idf)[names(idf) == "User.Email"] <- "Name"
        str(idf)
        idf <- aggregate(Duration.Minutes. ~ Name, data = idf, FUN = sum)
        idf[inputs$col_name[i]] <- idf$Duration.Minutes.
        idf <- select(idf, Name, inputs$col_name[i])
        output_file_name <- paste0(OUTPUT_DIRECTORY, "attendance_", inputs$course_id[i], ".csv")
        odf <- read.csv(output_file_name)
        odf <- merge(odf, idf, by = "Name", all = TRUE, sort = TRUE)
        odf <- odf[ , sort_cols(colnames(odf))] ## sort columns in date order
        headings <- c("Name", make.headings(colnames(odf))[2:length(colnames(odf))])
        write.table(odf, file = output_file_name, sep = ",",
                    row.names = FALSE, col.names = headings)

And this is the output I get:

Attach the following package: ‘dplyr’

The following objects are masked from ‘package:stats’:

     filter, lag

The following objects are masked from ‘package:base’:

     intersect, setdiff, setequal, union

[1] creating output file attendance_AEIII20.csv
eval(predvars, data, env) gave the following error: could not find object 'Name'
Called: aggregate ... eval -> <Anonymous> -> model.frame.default -> eval -> eval
execution stopped
All done! Press [Enter] to quit.

No output file is created.

ave-63 commented 3 years ago

OK, I think I figured it out. I think the problem is a little bit above, starting around line 94. I didn't realize that the first input csv file is read and processed there, which also needs to be fixed. You should see a block of code that looks like this:

    if(!output_exists){
        new_file_name <- paste0("attendance_", inputs$course_id[i], ".csv")
        print(paste("creating output file", new_file_name), quote = FALSE)
        #file.create(new_file_name)
        df <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i]))
        ## df$Duration.Minutes. <- strtoi(df$Duration.Minutes.) ##unnecesary?
        df <- aggregate(Duration.Minutes. ~ Name, data = df, FUN = sum)
        new_df <- data.frame(Name = df$Name)
        write.table(new_df, file = paste0(OUTPUT_DIRECTORY, new_file_name), row.names = FALSE)
    }

Change it to this:

    if(!output_exists){
        new_file_name <- paste0("attendance_", inputs$course_id[i], ".csv")
        print(paste("creating output file", new_file_name), quote = FALSE)
        #file.create(new_file_name)
        df <- read.csv(paste0(INPUT_DIRECTORY, inputs$file_name[i]))
        names(idf)[names(idf) == "User.Email"] <- "Name"
        ## df$Duration.Minutes. <- strtoi(df$Duration.Minutes.) ##unnecesary?
        df <- aggregate(Duration.Minutes. ~ Name, data = df, FUN = sum)
        new_df <- data.frame(Name = df$Name)
        write.table(new_df, file = paste0(OUTPUT_DIRECTORY, new_file_name), row.names = FALSE)
    }

Hope this works...