Open oruebel opened 5 years ago
A single table wouldn't work because each tier has different optional columns. Making them all just members of the intervals group could work but we would lose enforcing names and the optional settings field.
On Tue, May 28, 2019, 12:22 AM Oliver Ruebel notifications@github.com wrote:
It feels like this is something that would belong in the /intervals group in an NWB:N file rather than a separate Transcription group with TimeInterval tables.
As an alternative, would it make sense, instead to have Transcription be a single TimeIntervals table type with a column transcription_type to indicate whether a particular row transcribes a sentence, word, or syllable etc.?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bendichter/ndx-speech/issues/1?email_source=notifications&email_token=AAGOEEQ2YVHEV6SEQCKOKNTPXTMUTA5CNFSM4HQAOJDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWEYYZQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOEEQ3MKSCW2XBIXGTQWDPXTMUTANCNFSM4HQAOJDA .
I think what you are trying to do makes sense. I realize that some of my questions/suggestion are a bit nit-picky, but I'm just trying to see if we can make this fit more nicely and see if we can extract some broader reusable structure.
A single table wouldn't work because each tier has different optional columns...
Ok, I can see that if they need to have different columns that this won't work. Are those optional columns part of the schema of the extension or is this something that is unspecified and you expect that users will need to dynamically add different columns?
Making them all just members of the intervals group could work but we would lose enforcing names and the optional settings field.
I don't think you would lose the enforcement of names for the tables in this case, since the name is part of the specification of your type. With regard to the settings field, I think it may make sense to have that as a field on each table as the method by which you create the annotation for sentence may be different than what you are using to annotate words and syllabus. The main reason I think it would be useful to have these in /intervals is to make it easier for analysis tools to find all TimeIntervals tables in a consistent location.
General ideas:
IntervalModule
(i.e., something like ProcesssingModule
for intervals) also as part of /intervals to organize related tables. With that, the extension in its current form would fit nicely, in that Transcription
would become a IntervalModule
I've come around to this idea of putting them in /intervals
. I can always extend the TimeIntervals
type to enforce a name and/or accommodate more meta-data like settings
if I need to. I agree that it would be better to have settings
by-tier as opposed to one field for all tiers.
All times are with respect to the session start time, so I don't know if the hierarchical relationships necessarily need to be explicit. You can just run a simple command that asks which word events are within a given sentence interval. I was thinking about adding this as a convenience method to the Transcription
class, but honestly it's just a two-liner to segment the words table by a given sentence:
word_df = nwbfile.intervals['words'].to_dataframe()
word_df[word_df['start_time'] >= start & word_df['stop_time'] <= stop]
Since word_df['start_time']
is ordered you could use bisect
here but I don't think that's necessary.
I think IntervalModule
could make sense as a way of organizing TimeIntervals
objects, but seeing as these are my only non-default TimeIntervals
objects in this case I don't think I require that.
I was thinking about adding this as a convenience method to the
Transcription
class, ....
I think that would be useful in this case. Even though it's "just" two lines, it seems like those particular two lines are needed often enough that folks would frequently reinvent them (with associated errors).
I don't know if the hierarchical relationships necessarily need to be explicit.
I agree that having a separate column of DynamicTableRegions would be overkill in that it is more work than it is useful. In particular since its a strict nesting in time that you can recover easily. However, having a object-reference on the tables pointing from one layer in the hierarchy to the next seems like it would be useful to have (i.e., you possibly want two of these, one for "parent" and another for "child"). In this way, one could easily model arbitrary nesting of refinements and navigate the stack simply by asking for "next" or "previous" refinement level without having to explicitly know the names of the different levels. As such, this would 1) make it resuable for other nested TimeIntervals as well, 2) avoid users (and tools) having to know the names and nesting structure of the tables, and 3) simply the navidation of layers.
It feels like this is something that would belong in the
/intervals
group in an NWB:N file rather than a separateTranscription
group withTimeInterval
tables.As an alternative, would it make sense, instead to have
Transcription
be a singleTimeIntervals
table type with a columntranscription_type
to indicate whether a particular row transcribes asentence
,word
, orsyllable
etc.?