Closed lucasgautheron closed 3 years ago
Q1.
In our package, lena_block_type
is equal to the conversation_type for all segments that belong in a Conversation block, even non-human/speech segments, whereas the R package sets convType to NaN for these segments. Which way is better @alecristia ?
convType blkType spkr
0 NaN Pause TVF
1 NaN Pause SIL
2 NaN Pause TVF
3 NaN Pause SIL
4 NaN Pause NOF
5 NaN Pause SIL
6 NaN Pause TVF
7 NaN Pause NOF
8 NaN Pause SIL
9 NaN Pause NOF
...
21 AICF Conversation FAN
22 NaN Conversation NON
23 NaN Conversation OLN
24 NaN Conversation SIL
25 AICF Conversation FAN
26 NaN Conversation TVF
27 AICF Conversation FAN
28 NaN Conversation TVF
29 AICF Conversation FAN
30 NaN Conversation OLF
31 NaN Conversation NOF
Q2.
Currently the lists of cries, utterances and Vfxs (whatever that is) are stored in one column each, as a json, with the following format (e.g. for cries):
[{'startCry1': 8015.63, 'endCry1': 8016.25}, {'startCry2': 8016.77, 'endCry2': 8017.07}]
Is there any good reason why we should not do this instead ?
[{'start': 8015.63, 'end': 8016.25}, {'start': 8016.77, 'end': 8017.07}]
Q1.
In our package,
lena_block_type
is equal to the conversation_type for all segments that belong in a Conversation block, even non-human/speech segments, whereas the R package sets convType to NaN for these segments. Which way is better @alecristia ?convType blkType spkr 0 NaN Pause TVF 1 NaN Pause SIL 2 NaN Pause TVF 3 NaN Pause SIL 4 NaN Pause NOF 5 NaN Pause SIL 6 NaN Pause TVF 7 NaN Pause NOF 8 NaN Pause SIL 9 NaN Pause NOF ... 21 AICF Conversation FAN 22 NaN Conversation NON 23 NaN Conversation OLN 24 NaN Conversation SIL 25 AICF Conversation FAN 26 NaN Conversation TVF 27 AICF Conversation FAN 28 NaN Conversation TVF 29 AICF Conversation FAN 30 NaN Conversation OLF 31 NaN Conversation NOF
It seems more reasonable to me to keep the conversation type for all segments in a conversation block (even if they are non-speech). I understand why the other package may do this (eg to facilitate summing word counts in blocks) but conceptually it's more sensible to keep block identity stable within the block.
Q2.
Currently the lists of cries, utterances and Vfxs (whatever that is) are stored in one column each, as a json, with the following format (e.g. for cries):
[{'startCry1': 8015.63, 'endCry1': 8016.25}, {'startCry2': 8016.77, 'endCry2': 8017.07}]
Is there any good reason why we should not do this instead ?
[{'start': 8015.63, 'end': 8016.25}, {'start': 8016.77, 'end': 8017.07}]
no reason I could think of -- there are only advantages to your proposed notation in my view.
Before engaging too far into EL1000, and also before we "release" our package, it is necessary to cross-check our its importation routine...
We'll use https://htanderson.github.io/ITSbin/index.html as a cross-check.