HomeBankCode / rlena

R package for parsing LENA's .ITS files
GNU General Public License v2.0
6 stars 5 forks source link

AWC & CVC #6

Closed ebergelson closed 5 years ago

ebergelson commented 6 years ago

any chance of adding functions that give you #s the lena software spits out at the file level, like adult word count and child vocalization count? I assume those get collected from the guts of the its somehow, but haven't dug into how...(sweet r library by the way!)

tjmahr commented 6 years ago

Hmmm... It looks like @Teebusch added a lot of code that will make this easier. I'm going to merge in pull request #5. Is that okay, @Teebusch?

That gives us gather_blocks(), and we could leverage that for a summary function

library(rlena)
library(dplyr, warn.conflicts = FALSE)

# Download the example ITS file
url <- "https://cdn.rawgit.com/HomeBankCode/lena-its-tools/master/Example/e20160420_165405_010572.its"
its <- read_its_file(url)

its %>%
  gather_blocks() %>%
  group_by(itsId) %>%
  summarise(
    AWC = sum(adultWordCnt, na.rm = TRUE),
    CTC = sum(turnTaking, na.rm = TRUE),
    Child_Utterance_Count = sum(childUttCnt, na.rm = TRUE))
#> # A tibble: 1 x 4
#>   itsId                    AWC   CTC Child_Utterance_Count
#>   <chr>                  <dbl> <int>                 <int>
#> 1 20160420_165405_010572 9829.   370                  1228

I don't know how these word counts and turns compare to what LENA would compute for the same recording. (I can't get ADEX to work on the example ITS file.)

I am also not sure what codes the child vocalization count. I used childUttCnt but I don't know how that matches LENA exports.

andreiamatuni commented 6 years ago

I just tried out this function with one of our ITS files, comparing it with the LENA software's day long reports. The numbers seem to match exactly. I will try out a couple more files just to be sure, but otherwise I think this is a perfect solution to our issue. thank you!