bendichter / ndx-speech

Other
0 stars 0 forks source link

Question about possible alternate approach #1

Open oruebel opened 5 years ago

oruebel commented 5 years ago

It feels like this is something that would belong in the /intervals group in an NWB:N file rather than a separate Transcription group with TimeInterval tables.

As an alternative, would it make sense, instead to have Transcription be a single TimeIntervals table type with a column transcription_type to indicate whether a particular row transcribes a sentence, word, or syllable etc.?

bendichter commented 5 years ago

A single table wouldn't work because each tier has different optional columns. Making them all just members of the intervals group could work but we would lose enforcing names and the optional settings field.

On Tue, May 28, 2019, 12:22 AM Oliver Ruebel notifications@github.com wrote:

It feels like this is something that would belong in the /intervals group in an NWB:N file rather than a separate Transcription group with TimeInterval tables.

As an alternative, would it make sense, instead to have Transcription be a single TimeIntervals table type with a column transcription_type to indicate whether a particular row transcribes a sentence, word, or syllable etc.?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bendichter/ndx-speech/issues/1?email_source=notifications&email_token=AAGOEEQ2YVHEV6SEQCKOKNTPXTMUTA5CNFSM4HQAOJDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GWEYYZQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGOEEQ3MKSCW2XBIXGTQWDPXTMUTANCNFSM4HQAOJDA .

oruebel commented 5 years ago

I think what you are trying to do makes sense. I realize that some of my questions/suggestion are a bit nit-picky, but I'm just trying to see if we can make this fit more nicely and see if we can extract some broader reusable structure.

A single table wouldn't work because each tier has different optional columns...

Ok, I can see that if they need to have different columns that this won't work. Are those optional columns part of the schema of the extension or is this something that is unspecified and you expect that users will need to dynamically add different columns?

Making them all just members of the intervals group could work but we would lose enforcing names and the optional settings field.

I don't think you would lose the enforcement of names for the tables in this case, since the name is part of the specification of your type. With regard to the settings field, I think it may make sense to have that as a field on each table as the method by which you create the annotation for sentence may be different than what you are using to annotate words and syllabus. The main reason I think it would be useful to have these in /intervals is to make it easier for analysis tools to find all TimeIntervals tables in a consistent location.

General ideas:

  1. Make relationship between tables explicit: The hierarchy of the tiers seems to be currently "implicit" based on semantics of the tables. Would it make sense to either have a object-reference on the table to point to next refinement level to make this link explicit? Alternatively, if the refinement of, e.g., a sentence into words, is something one needs to follow, would it make sense to have a column with a DynamicTableReference that points for a sentence to corresponding words in the other table?
  2. IntervalModule I think what your example (and current design) may hint at is the potential need for IntervalModule (i.e., something like ProcesssingModule for intervals) also as part of /intervals to organize related tables. With that, the extension in its current form would fit nicely, in that Transcription would become a IntervalModule
bendichter commented 5 years ago

I've come around to this idea of putting them in /intervals. I can always extend the TimeIntervals type to enforce a name and/or accommodate more meta-data like settings if I need to. I agree that it would be better to have settings by-tier as opposed to one field for all tiers.

All times are with respect to the session start time, so I don't know if the hierarchical relationships necessarily need to be explicit. You can just run a simple command that asks which word events are within a given sentence interval. I was thinking about adding this as a convenience method to the Transcription class, but honestly it's just a two-liner to segment the words table by a given sentence:

word_df = nwbfile.intervals['words'].to_dataframe()
word_df[word_df['start_time'] >= start & word_df['stop_time'] <= stop] 

Since word_df['start_time'] is ordered you could use bisect here but I don't think that's necessary.

I think IntervalModule could make sense as a way of organizing TimeIntervals objects, but seeing as these are my only non-default TimeIntervals objects in this case I don't think I require that.

oruebel commented 5 years ago

I was thinking about adding this as a convenience method to the Transcription class, ....

I think that would be useful in this case. Even though it's "just" two lines, it seems like those particular two lines are needed often enough that folks would frequently reinvent them (with associated errors).

I don't know if the hierarchical relationships necessarily need to be explicit.

I agree that having a separate column of DynamicTableRegions would be overkill in that it is more work than it is useful. In particular since its a strict nesting in time that you can recover easily. However, having a object-reference on the tables pointing from one layer in the hierarchy to the next seems like it would be useful to have (i.e., you possibly want two of these, one for "parent" and another for "child"). In this way, one could easily model arbitrary nesting of refinements and navigate the stack simply by asking for "next" or "previous" refinement level without having to explicitly know the names of the different levels. As such, this would 1) make it resuable for other nested TimeIntervals as well, 2) avoid users (and tools) having to know the names and nesting structure of the tables, and 3) simply the navidation of layers.