buzsakilab / buzcode

Code for internal lab sharing - polishing has started but is by no means complete
http://www.buzsakilab.com/
GNU General Public License v3.0
119 stars 128 forks source link

buzcode format typechecking #85

Open DavidTingley opened 7 years ago

DavidTingley commented 7 years ago

-isCellinfo -isBehavior -isState -isPopInfo -isLFP -isEvent -etc, etc

DavidTingley commented 7 years ago

see 585685f for an example

DavidTingley commented 7 years ago

isEvent and isLFP done, 5fa1ccb

DavidTingley commented 7 years ago

isBehavior done,72317529365facac15066a497d1690e0a09b0625

we still need: -isState -isPopInfo -isSessionInfo

dlevenstein commented 7 years ago

I can make isState tomorrow.

On Aug 31, 2017, at 10:01 PM, David Tingley notifications@github.com wrote:

isBehavior done,7231752

we still need: -isState -isPopInfo -isSessionInfo

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326467592, or mute the thread https://github.com/notifications/unsubscribe-auth/AG7dmBD5N-UBgdNKpEmlvKtKqpKw2i1hks5sd2WDgaJpZM4O61cu.

brendonw1 commented 7 years ago

I'm just trying to keep up. Can you guys briefly explain what these are? On Fri, Sep 1, 2017 at 12:00 AM Dan Levenstein notifications@github.com wrote:

I can make isState tomorrow.

On Aug 31, 2017, at 10:01 PM, David Tingley notifications@github.com wrote:

isBehavior done,7231752

we still need: -isState -isPopInfo -isSessionInfo

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326467592>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AG7dmBD5N-UBgdNKpEmlvKtKqpKw2i1hks5sd2WDgaJpZM4O61cu .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326481385, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTROjb-Phh9_jYSF6iD0vgpy57KzXks5sd4FNgaJpZM4O61cu .

juneshuoyang commented 7 years ago

@brendonw1 please see the email communication below.

David wrote: Partially correct. These functions (so far) aren't meant to check the quality of an entire recording/experiment, rather the individual structures that store parts of the data of an experiment (are all of the required fields present and formatted correctly?). We could eventually wrap them into a 'isCompleteRecording.m' function that satisfies point 1.

Sam wrote: here is where the minimally complete datatype will be useful

June and I discussed yesterday that we need to formalize two things

1) the minimally complete data required for an experiment to be included in buzcode format. this will likely change, but we need to decide what our inclusion criteria will be for the database at least at this point in history. for example, if you don't know the region of your shanks, the data is garbage.

2) the minimal information we want to extract and make searchable in a database. this is obviously a subset of 1.

I imagine that these helper functions will return TRUE only when point 1 is met

David wrote: Sure, it is a list of function we'll eventually create that check the .mat structs we are using and returns true/false if they conform to our data standard. Essentially the same as something like the matlab builtin function isinterger.m but for the more complicated structures we're making.

June wrote: Hi David,

I am sorry I do not understand the purpose of this typechecking or this issue (#85) overall.

Can you provide some instructions?

Thank you in advance, June

brendonw1 commented 7 years ago

Oh yeah perfectly aligned with my other question. Thank you very much.

I'll be curious to see what you guys decide On Fri, Sep 1, 2017 at 7:28 AM juneshuoyang notifications@github.com wrote:

@brendonw1 https://github.com/brendonw1 please see the email communication below.

David wrote: Partially correct. These functions (so far) aren't meant to check the quality of an entire recording/experiment, rather the individual structures that store parts of the data of an experiment (are all of the required fields present and formatted correctly?). We could eventually wrap them into a 'isCompleteRecording.m' function that satisfies point 1.

Sam wrote: here is where the minimally complete datatype will be useful

June and I discussed yesterday that we need to formalize two things

1.

the minimally complete data required for an experiment to be included in buzcode format. this will likely change, but we need to decide what our inclusion criteria will be for the database at least at this point in history. for example, if you don't know the region of your shanks, the data is garbage. 2.

the minimal information we want to extract and make searchable in a database. this is obviously a subset of 1.

I imagine that these helper functions will return TRUE only when point 1 is met

On Thu, Aug 31, 2017 at 5:59 PM, David Tingley davidtingley2@gmail.com wrote: Sure, it is a list of function we'll eventually create that check the .mat structs we are using and returns true/false if they conform to our data standard. Essentially the same as something like the matlab builtin function isinterger.m but for the more complicated structures we're making.

June wrote: Hi David,

I am sorry I do not understand the purpose of this typechecking or this issue (#85 https://github.com/buzsakilab/buzcode/issues/85) overall.

Can you provide some instructions?

Thank you in advance, June

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326557353, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTSHYUybNAMNyX1FunDi4TLL0CNX5ks5sd-pOgaJpZM4O61cu .

juneshuoyang commented 7 years ago
Mat Struct IO Function Typecheking Function
SessionInfo bz_getSeeionInfo (To be built)
PopInfo ? (To be built)
CellInfo bz_GetSpikes bz_isCellInfo
LFP bz_GetLFP bz_isLFP
Events bz_LoadEvents bz_isEvent
States bz_LoadStates (To be built)
Manipulation ? ?
Behavior ? ?

@DavidTingley @dlevenstein

I need some help with these:

  1. Please help replace the question marks and confirm the information in the rest of the table above.

    • where can I find bz_isBehavior?
  2. When do you expect users to use these typechecking functions?

  3. Within the IO folder, I can find SaveFeatures.m and bz_GetWidebandData.m. Are these two functions related with any data conversion procedures? If yes, how?

  4. Any other functions not in the IO folder will be useful for converting data in the new format?

dlevenstein commented 7 years ago

Ah, I'd made a bz_LoadBehavior function a while ago but saved it on my personal github as opposed to buzcode. Just added it to the io folder

https://github.com/buzsakilab/buzcode/blob/master/io/bz_LoadBehavior.m

Note: this may need to be updated to match formatting/functionality of bz_LoadEvents etc.

dlevenstein commented 7 years ago

1) The other question marks, to my knowledge, aren't built yet. Note also that the standards for some of them aren't yet fully established and should be discussed and documented. (i.e. Manipulation, as I discussed with @samamckenzie yesterday)

2) Type checking functions should probably be used within the loading functions to make sure that the structure fits the guidelines. Most of the time we'll be loading states/events/etc using these i/o functions and the output will cause issues down the line if things don't conform. Any other times you can think of?

David will know more about 3) and 4) than I, the answers are no to the best of my knowledge.

brendonw1 commented 7 years ago

This is cool On Fri, Sep 1, 2017 at 9:41 AM Dan Levenstein notifications@github.com wrote:

The other question marks, to my knowledge, aren't built yet. Note also that the standards for some of them aren't yet fully established and should be discussed and documented. (i.e. Manipulation, as I discussed with @samamckenzie https://github.com/samamckenzie yesterday)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326583083, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTUQhXY2LhvEwqz4_wu0XUUpZJ8T6ks5seAmlgaJpZM4O61cu .

samamckenzie commented 7 years ago

For manipulations,

I store my voltage trace as a wideband signal, then extract event times and labels. For complex shapes, I save the waveform of the stimulation that I can then convolve with the timestamps to recover the full full stimulation profile. This is memory efficient, though perhaps not as readable as storing the full down samples timeseries, which is the NWB standard btw.

I suggest that we sacrifice efficient memory storage for readability and cross format compatibility and store the downsampled traces

I also think that we should decide which of these data types is mandatory to be considered a valid dataset. For things like behavior and manipulations, some experiments will not have any explicitly defined. Should we assume that absence of the data type is absence in the experiment or should we initialize these data type to some standard null.

On Fri, Sep 1, 2017 at 9:41 AM, Dan Levenstein notifications@github.com wrote:

The other question marks, to my knowledge, aren't built yet. Note also that the standards for some of them aren't yet fully established and should be discussed and documented. (i.e. Manipulation, as I discussed with @samamckenzie https://github.com/samamckenzie yesterday)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326583083, or mute the thread https://github.com/notifications/unsubscribe-auth/ANnfgF5JK5ge4HhgGi53uhSMgi8RCXjhks5seAmmgaJpZM4O61cu .

dlevenstein commented 7 years ago

Yes - I think we assume absence of data type is absence in the experiment.

I agree re: traces. Proposal: Manipulation should have a similar form to the other buzcode formats: file: baseName.manipulationType.manipulation.mat -contains structure named manipulationType manipulationType.timestamps manipulationType.data (maybe a different name here, magnitude?) (others needed/suggested?)

for example electrical stimulation could be in a file named baseName.EStim.manipulation.mat containing a structure called EStim with fields EStim.timestamps, EStim.data which shows the magnitude of electrical stimulation at each timestamp. This is also nice because it allows non-continuous manipulation (i.e. timestamps don't have to be each timepoint in the recording) recommended samplingRate of the manipulation should match the LFP. Will/should this also be robust for different types of manipulations, for example, sensory stimulation?

UPDATE: put this in the wiki, feel free to improve/modify with discussion https://github.com/buzsakilab/buzcode/wiki/Data-Formatting-Standards#manipulation

samamckenzie commented 7 years ago

OK, sounds like a plan.

Is there really such a thing as absence of behavior. I think even the homecage sessions should be tagged as such somewhere. Otherwise I agree.

On Fri, Sep 1, 2017 at 9:58 AM, Dan Levenstein notifications@github.com wrote:

Yes - I think assume absence of data type is absence in the experiment.

I agree re: traces. Proposal: Manipulation should have a similar form to the other buzcode formats: file: baseName.manipulationType.manipulation.mat -contains structure named manipulationType manipulationType.timestamps manipulationType.data (maybe a different name here, magnitude?) (others needed/suggested?)

for example electrical stimulation could be in a file named baseName.EStim.manipulation.mat containing a structure called EStim with fields EStim.timestamps, EStim.data which shows the magnitude of electrical stimulation at each timestamp. This is also nice because it allows non-continuous manipulation (i.e. timestamps don't have to be each timepoint in the recording) recommended samplingRate of the manipulation should match the LFP. Will/should this also be robust for different types of manipulations, for example, sensory stimulation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326587267, or mute the thread https://github.com/notifications/unsubscribe-auth/ANnfgI-skqkdAKeVrkcbxzAT0TU_SOi7ks5seA1xgaJpZM4O61cu .

dlevenstein commented 7 years ago

I think homecage would be tagged as such in the metadata, but no reason to have a whole behavior.mat file that just says "homecage".

On a related note, the behavior.mat guidelines need some refinement. They assume behavior correspond to navigation in space (i.e. x/y position), but actually some of the behaviors will not have this. For example, headfixed behavior, or whisking behavior.

brendonw1 commented 7 years ago

How does behavior.mat work? Is it on a per-second basis or something?

On Fri, Sep 1, 2017 at 7:10 AM, Dan Levenstein notifications@github.com wrote:

I think homecage would be tagged as such in the metadata, but no reason to have a whole behavior.mat file that just says "homecage".

On a related note, the behavior.mat guidelines need some refinement. They assume behavior correspond to navigation in space (i.e. x/y position), but actually some of the behaviors will not have this. For example, headfixed behavior, or whisking behavior.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326590441, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTaX220eVmGWn022qXLAfl5ed0HcDks5seBA8gaJpZM4O61cu .

DavidTingley commented 7 years ago

@brendonw1, check out the example behavior struct in the repo. I think I'm the only one using this format at the moment and it's been working really well for me so far.

@dlevenstein, these fields can be empty and there is an .events sub-structure that I use to store trial information. Additionally, optional extra fields could be added (whiskerPosition, wheelPosition, etc).

dlevenstein commented 7 years ago

David and Rachel and I just updated the behavior.mat wiki. Could you give it a look through and see if it makes sense and would account for your behavior needs?

https://github.com/buzsakilab/buzcode/wiki/Data-Formatting-Standards#behavior

@DavidTingley I updated the example behavior.mat file in exampleDataStructs. Saved as fbasename.positionTracking.behavior.mat. Could you take a look over it and if you agree, delete the depreciated fbasename.behavior.mat?

DavidTingley commented 7 years ago

I've split this out as another issue #97 as there are several functions we will need to change or make for this.

brendonw1 commented 7 years ago

Is it more precise to use intervals (ie start stop pairs of columns) rather than per-timebin scoring of behavior?

On Fri, Sep 1, 2017 at 4:03 PM, David Tingley notifications@github.com wrote:

I've split this out as another issue #97 https://github.com/buzsakilab/buzcode/issues/97 as there are several functions we will need to change or make for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326672097, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTfUbNz9mCNhNoIfXJ0aFc2rBN3iUks5seGMSgaJpZM4O61cu .

dlevenstein commented 7 years ago

much of this was designed with continuous-variable behavior in mind (i.e. x/y position), with the option for event-based behavior as well.

I had this question as well, i.e. much behavior is not continuous time but is in terms of start/stops of events. This sort of behavioral tagging will go in the behaviorName.events substructure (for example, I'll use EMGwhisking.events.whisks for whisking onset/offset). Or you can use positionTracking.events.movement to store on/offset of movement in homecage recordings

Most behavior tracking comes from a continuous signal (motion etc), which can still be stored for reference in the timestamps/samplingrate/datasubstructs

Do you think that addresses your concern?

brendonw1 commented 7 years ago

Not really. I mean it's easy to convert between the two formats, but I'd always store in the format that's both more precise and less data intensive at once.

On Fri, Sep 1, 2017 at 4:44 PM, Dan Levenstein notifications@github.com wrote:

much of this was designed with continuous-variable behavior in mind (i.e. x/y position), with the option for event-based behavior as well.

I had this question as well, i.e. much behavior is not continuous time but is in terms of start/stops of events. This sort of behavioral tagging will go in the behaviorName.events substructure (for example, I'll use EMGwhisking.events.whisks for whisking onset/offset). Or you can use positionTracking.events.movement to store on/offset of movement in homecage recordings

Most behavior tracking comes from a continuous signal (motion etc), which can still be stored for reference in the timestamps/samplingrate/ datasubstructs

Do you think that addresses your concern?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326679854, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTfYp77-7if4kAAGxDNwGYRoMItuTks5seGyggaJpZM4O61cu .

brendonw1 commented 7 years ago

This is the beginning and making decisions now that trap people in the future should be avoided at all costs... would annoy "generations" of lab members in the future. What if someone wants 20kHz resolution... you have a timepoint for each of those?

On Fri, Sep 1, 2017 at 4:53 PM, Brendon Watson brendon.watson@gmail.com wrote:

Not really. I mean it's easy to convert between the two formats, but I'd always store in the format that's both more precise and less data intensive at once.

On Fri, Sep 1, 2017 at 4:44 PM, Dan Levenstein notifications@github.com wrote:

much of this was designed with continuous-variable behavior in mind (i.e. x/y position), with the option for event-based behavior as well.

I had this question as well, i.e. much behavior is not continuous time but is in terms of start/stops of events. This sort of behavioral tagging will go in the behaviorName.events substructure (for example, I'll use EMGwhisking.events.whisks for whisking onset/offset). Or you can use positionTracking.events.movement to store on/offset of movement in homecage recordings

Most behavior tracking comes from a continuous signal (motion etc), which can still be stored for reference in the timestamps/samplingrate/datasu bstructs

Do you think that addresses your concern?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326679854, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTfYp77-7if4kAAGxDNwGYRoMItuTks5seGyggaJpZM4O61cu .

dlevenstein commented 7 years ago

I think I'm a little confused as to your concern then? You can have start/stop times in the events field of your behavior structure. Which are in seconds so can have whatever resolution you want.

If someone wants 20kHz resolution for a continuous signal, you would need a time point for each. But it sounds like what you're worried about is a 20kHz behaviorA vs behaviorB vector, which is not what we're imagining. The timestamps/datasubstrct is for continuous signals like position/orientation.

i.e. "scoring" behavior will be in stop/start times timestamps/data will be only for continuous signals

(scored behavior would be saved in the events subfield. but we can change this is you would like, this is just what @DavidTingley and I came up with to meet halfway between our needs. it if needs updating, we can discuss and update, which is the point of discussion here (this has been the topic of all day discussion at this point...)) David and most everyone expressed need for a continuous behavior variable

brendonw1 commented 7 years ago

Oh maybe I misunderstood. I thought you were doing a behavior a vs behavior b thing. Like homecage vs maze or something. Will that exist? On Fri, Sep 1, 2017 at 4:58 PM Dan Levenstein notifications@github.com wrote:

I think I'm a little confused as to your concern then? You can have start/stop times in the events field of your behavior structure. Which are in seconds so can have whatever resolution you want.

If someone wants 20kHz resolution for a continuous signal, you would need a time point for each. But it sounds like what you're worried about is a 20kHz behaviorA vs behaviorB vector, which is not what we're imagining. The timestamps/datasubstrct is for continuous signals like position/orientation.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326682541, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTXuYIZz8ktHsUrs6w4O18cM6jPbEks5seHAGgaJpZM4O61cu .

dlevenstein commented 7 years ago

Yes! It's unclear to me at this point where such a thing should live. Possibly could be in this data type...

But perhaps in a sessioninfo file? Either as part of the main metadata or as a separate recordingEpochs.sessionInfo.mat file which has Info about any major time windows of the recording (I.e homecage vs maze or times of merged .Dats)

brendonw1 commented 7 years ago

But yeah the rest of what you said you guys are doing sounds great.

I just think of that as a super basic thing in your recording: presleep, behave, postsleep. That at least should be somewhere super easy to get at. I think I'd make an optional field in behaviorName.mat for StartEnd as a quick reference for when that behaviorName very first starts and very last ends. What do you think? An alternative could be as a states.mat, but personally I'd probably separate them.

1) I'd have things like Rearing, jumping, running, NREM, REM, as a states.mat 2) I'd have things like "on track", "in home cage" as a behaviorName.mat with StartEnd (or StartStop) times simply states

or 3) You could have some other thing like TaskPhase.mat

What do you guys think?

On Fri, Sep 1, 2017 at 6:19 PM Dan Levenstein notifications@github.com wrote:

Yes! It's unclear to me at this point where such a thing should live. Possibly could be in this data type...

But perhaps in a sessioninfo file? Either as part of the main metadata or as a separate recordingEpochs.sessionInfo.mat file which has Info about any major time windows of the recording (I.e homecage vs maze or times of merged .Dats)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/85#issuecomment-326695615, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTQEyFJkZgcUo-07nwNT8fV1WEHP3ks5seIMJgaJpZM4O61cu .

dlevenstein commented 6 years ago

A reminder here (for myself) that this issue will be completed once bz_isStates is completed.

dlevenstein commented 5 years ago

We might want to change this behavior.... it comes up when loading so many things (events etc)

Warning: one of the required fields for an behavior type does not exist

In bz_isBehavior (line 35) In bz_LoadBehavior (line 46) Warning: Your behavior structure does not meet buzcode standards. Sad. In bz_LoadBehavior (line 49) Warning: one of the required fields for an behavior type does not exist In bz_isBehavior (line 35) In bz_LoadBehavior (line 46) Warning: Your behavior structure does not meet buzcode standards. Sad. In bz_LoadBehavior (line 49) Warning: one of the required fields for an behavior type does not exist In bz_isBehavior (line 35) In bz_LoadBehavior (line 46) Warning: Your behavior structure does not meet buzcode standards. Sad. In bz_LoadBehavior (line 49)