I've made some additions to the package. There are a few open issues (see below), but I won't have much time to work on it in the next two weeks and I believe they are are small enough to ignore for now and fix later.
Improvements:
There are now additional functions to extract recordings, blocks, segments
gathered data frames have itsId, recId, blkId, blkTypeId, segId, blkType columns for easier joining etc.
all columns with duration strings are now cleaned (converted to numeric)
gathered data frames have start/endClockTime and start/endClockTimeLocal columns with timestamps in UTC resp. local time.
the conversationInfo column in the segments data frame is split into separate column
data frame columns are sorted for better readability
vastly extended documentation
read_lena_its function can load xml directly
Open Issues:
The documentation could use more work (e.g. not all speaker type labels, e.g. FAF, MAF, documented in ?rlena)
It might make sense to split the date and time of the timestamps into separate columns for easier filtering etc. (otherwise the user has to do it)
When the conversationInformation column is split it creates columns with many NA's. Some of them are actually containing running counts (of conversational turns etc.), so the NA's can be filled with the last known value.
General Issues (might not fix anytime soon):
some additional tests might be useful
The column naming is a bit inconsistent and messy (e.g. speaker in the block nodes, but spkr in the segment nodes, startClockTimeLocal is an awfully long column name for a column that may be used a lot). This is mainly because LENA does it this way, but it's maybe worth thinking about.
I've made some additions to the package. There are a few open issues (see below), but I won't have much time to work on it in the next two weeks and I believe they are are small enough to ignore for now and fix later.
Improvements:
itsId
,recId
,blkId
,blkTypeId
,segId
,blkType
columns for easier joining etc.start/endClockTime
andstart/endClockTimeLocal
columns with timestamps in UTC resp. local time.conversationInfo
column in the segments data frame is split into separate columnOpen Issues:
?rlena
)conversationInformation
column is split it creates columns with manyNA
's. Some of them are actually containing running counts (of conversational turns etc.), so theNA
's can be filled with the last known value.General Issues (might not fix anytime soon):
speaker
in the block nodes, butspkr
in the segment nodes,startClockTimeLocal
is an awfully long column name for a column that may be used a lot). This is mainly because LENA does it this way, but it's maybe worth thinking about.