Open dlevenstein opened 6 years ago
This is difficult and certainly many of us have handled this in handling our own datasets. But before we walk about outputs, can you explain how you're handling the input side? Cellarray of basepaths? And then it loops and calls itself using a single basepath input?
As for output - two thought
1) it may not be totally necessary to make single array for people - instead you can just use the multi-input to make sure it executes in the usual way but just on each out put... so like a batch processing but not necessarily to gather them all together. In that case you can write a secondary "Gather" function. A system like this is actually what I'd assume the multi-basepath input does... just batching without combining
2) If you want to combine, I like the principle of combining to the max extent that logically makes sense. So the most combined is to make a numeric array... but maybe if they have different dimensions or somthing you do a cell/struct. Then I'd make a metadata thing labeling, say, how many rows are from each session.
My thoughts Thanks for doing this
B
On Fri, May 18, 2018 at 1:51 PM, Dan Levenstein notifications@github.com wrote:
I'm adding functionality to bz_LoadCellinfo (which can then be copied to the other I/O functions) to load from an entire dataset of basePaths
Why? I calculated ISI statistics (function for this coming soon) for cells from a bunch of recordings, and saved as baseName.ISIStats.cellinfo.mat in each basePath... I would now like to load those from ALL basePaths in the dataset.
My question for y'all is: how would we like this to be output.
- Should it just return a 1/N structure array https://www.mathworks.com/help/matlab/matlab_prog/create-a-structure-array.html where: ISIStats(1) is the cellinfo structure from the first recording ISIStats(2) is the cellinfo structure from the second recording ... ISIStats(N) is the cellinfo structure from the Nth recording. These are a little unwieldy, and will be need to be combined post-hoc for easy comparison.
Or 2) should it be a single structure where each of the fields are concatenated? This is a little harder to code up and will inevitably have some bugs to iron out relating to alignment etc. But will be much easier to analyze.
or 3) other
Either way, it will tack on the baseName from which the cellInfo was loaded, so that things don't get lost. I.e. each cell will have a UID (unitID) and baseName, both of which will be needed to uniquely identify that cell. We can make a function to align two multi-basePath structures so that UIDs and baseNames line up.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTeNqbm96SxQb60FZDPoS0WSvplpLks5tzwoKgaJpZM4UFHm9 .
Input side:
user provides a dataset folder (i.e. topPath, with many basePaths in it). [ basePaths,baseNames ] = bz_FindBasePaths(topPath) crawls through folders in topPath and gets any basePaths. I just updated bz_FindBasePath so the user can select which basePaths they want from a list of those in topPath. (Kind of like you can do with SleepScoreMaster) Will push this soon, along with the update to bz_LoadCellinfo that doesn't combine but makes a structure array, so y'all can try it out.
We can also make it so a list of baseNames can be provided (i.e. to avoid selecting manually multiple times etc etc)
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein notifications@github.com:
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
I think having the results of analyses saved with the files is also useful, especially for results that are more complicated than a simple property of the cell, because I can save it with necessary analysis metadata. I'm sure there are other ways to do the same thing, some of which are probably better. At the end of the day, this is a tool I need now, so I'm making it the way I know how.
Definitely interested to hear about ways we could use database organization for compiling/saving analysis results, and how I might be able to improve my analysis pipeline (but I won't be here for lab meeting this week).
On May 19, 2018, at 4:21 PM, Peter Petersen notifications@github.com wrote:
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein notifications@github.com:
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
I completely agree. I guess I was not very clear. Post analysis could be kept entirely in a database, but with our current file structure, I do not believe that this is the solution we want at this point.
When I meant to say is that instead of scanning a directory of recordings, each recording session should be referred to by an unique ID, and you would define a collection by a set of IDs. This can be done very effectively with our mySQL database. I am doing this now in the way that I assign a unique id to each of my recording sessions and I save this information in a central .m file, Analysis done across sessions are referred to the set of IDs. I suggest that we take this a step further and have a unique identifier for all our recording sessions (independent on where it is stored). This is already the case for the sessions on the NYU share dataset folder, which are in the database so what I suggest is to expand this to also describe the sessions at other locations. In this setup we would still save analysis results locally.
2018-05-19 17:00 GMT-04:00 Dan Levenstein notifications@github.com:
I think having the results of analyses saved with the files is also useful, especially for results that are more complicated than a simple property of the cell, because I can save it with necessary analysis metadata. I'm sure there are other ways to do the same thing, some of which are probably better. At the end of the day, this is a tool I need now, so I'm making it the way I know how.
Definitely interested to hear about ways we could use database organization for compiling/saving analysis results, and how I might be able to improve my analysis pipeline (but I won't be here for lab meeting this week).
On May 19, 2018, at 4:21 PM, Peter Petersen notifications@github.com wrote:
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein notifications@github.com:
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198# issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390432340, or mute the thread https://github.com/notifications/unsubscribe-auth/ANb8uZWDLmvwZ9rmbbQN32IEJIP9gz0Gks5t0If0gaJpZM4UFHm9 .
This all sounds beautiful. I think what I am about to say is likely unnecessary but in case it’s not: I don’t think we should ONLY use database-based I/O. If we do that it immediately takes away the universality of buzcode. But having it in addition to other io would be awesome On Sun, May 20, 2018 at 5:51 PM Peter Petersen notifications@github.com wrote:
I completely agree. I guess I was not very clear. Post analysis could be kept entirely in a database, but with our current file structure, I do not believe that this is the solution we want at this point.
When I meant to say is that instead of scanning a directory of recordings, each recording session should be referred to by an unique ID, and you would define a collection by a set of IDs. This can be done very effectively with our mySQL database. I am doing this now in the way that I assign a unique id to each of my recording sessions and I save this information in a central .m file, Analysis done across sessions are referred to the set of IDs. I suggest that we take this a step further and have a unique identifier for all our recording sessions (independent on where it is stored). This is already the case for the sessions on the NYU share dataset folder, which are in the database so what I suggest is to expand this to also describe the sessions at other locations. In this setup we would still save analysis results locally.
2018-05-19 17:00 GMT-04:00 Dan Levenstein notifications@github.com:
I think having the results of analyses saved with the files is also useful, especially for results that are more complicated than a simple property of the cell, because I can save it with necessary analysis metadata. I'm sure there are other ways to do the same thing, some of which are probably better. At the end of the day, this is a tool I need now, so I'm making it the way I know how.
Definitely interested to hear about ways we could use database organization for compiling/saving analysis results, and how I might be able to improve my analysis pipeline (but I won't be here for lab meeting this week).
On May 19, 2018, at 4:21 PM, Peter Petersen notifications@github.com wrote:
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein notifications@github.com:
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198# issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390432340 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ANb8uZWDLmvwZ9rmbbQN32IEJIP9gz0Gks5t0If0gaJpZM4UFHm9
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390515693, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTbLZ5W8TxPN2_jf2Qxuyg5ZCZSxkks5t0eV5gaJpZM4UFHm9 .
That is fine of course. How do people refer to the individual sessions when analysing across sessions? I always index my sessions. For me it is not very different from using a database. The database would not have to be a requirement, it is just a matter of where the list of sessions is loaded from.
Peter
man. 21. maj 2018 kl. 04.39 skrev Brendon Watson notifications@github.com:
This all sounds beautiful. I think what I am about to say is likely unnecessary but in case it’s not: I don’t think we should ONLY use database-based I/O. If we do that it immediately takes away the universality of buzcode. But having it in addition to other io would be awesome On Sun, May 20, 2018 at 5:51 PM Peter Petersen notifications@github.com wrote:
I completely agree. I guess I was not very clear. Post analysis could be kept entirely in a database, but with our current file structure, I do not believe that this is the solution we want at this point.
When I meant to say is that instead of scanning a directory of recordings, each recording session should be referred to by an unique ID, and you would define a collection by a set of IDs. This can be done very effectively with our mySQL database. I am doing this now in the way that I assign a unique id to each of my recording sessions and I save this information in a central .m file, Analysis done across sessions are referred to the set of IDs. I suggest that we take this a step further and have a unique identifier for all our recording sessions (independent on where it is stored). This is already the case for the sessions on the NYU share dataset folder, which are in the database so what I suggest is to expand this to also describe the sessions at other locations. In this setup we would still save analysis results locally.
2018-05-19 17:00 GMT-04:00 Dan Levenstein notifications@github.com:
I think having the results of analyses saved with the files is also useful, especially for results that are more complicated than a simple property of the cell, because I can save it with necessary analysis metadata. I'm sure there are other ways to do the same thing, some of which are probably better. At the end of the day, this is a tool I need now, so I'm making it the way I know how.
Definitely interested to hear about ways we could use database organization for compiling/saving analysis results, and how I might be able to improve my analysis pipeline (but I won't be here for lab meeting this week).
On May 19, 2018, at 4:21 PM, Peter Petersen < notifications@github.com> wrote:
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein <notifications@github.com :
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198# issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390432340 , or mute the thread <
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub <https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390515693 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADXrTbLZ5W8TxPN2_jf2Qxuyg5ZCZSxkks5t0eV5gaJpZM4UFHm9
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390590897, or mute the thread https://github.com/notifications/unsubscribe-auth/ANb8uZ6cjL_D6FWRVqvDh9w1UrWSZW-gks5t0n05gaJpZM4UFHm9 .
I had a system that's not great, but I feel it's flexible - which is a list of basepaths - depending on the analysis. That way I don't have to have things in a given place - ie if I want to analyze different ways. You can select based on criteria either before or after generating the initial path list. It works fine, though I'm sure the database aficionados wouldn't like it particularly much. ... I base it on a big matrix/sheet I have of sessions and which components each session has... sleep, behave, cortex, hippocampus etc and then once it finds ones that works it takes the pathnames from that chart
On Mon, May 21, 2018 at 7:51 AM, Peter Petersen notifications@github.com wrote:
That is fine of course. How do people refer to the individual sessions when analysing across sessions? I always index my sessions. For me it is not very different from using a database. The database would not have to be a requirement, it is just a matter of where the list of sessions is loaded from.
Peter
man. 21. maj 2018 kl. 04.39 skrev Brendon Watson <notifications@github.com
:
This all sounds beautiful. I think what I am about to say is likely unnecessary but in case it’s not: I don’t think we should ONLY use database-based I/O. If we do that it immediately takes away the universality of buzcode. But having it in addition to other io would be awesome On Sun, May 20, 2018 at 5:51 PM Peter Petersen <notifications@github.com
wrote:
I completely agree. I guess I was not very clear. Post analysis could be kept entirely in a database, but with our current file structure, I do not believe that this is the solution we want at this point.
When I meant to say is that instead of scanning a directory of recordings, each recording session should be referred to by an unique ID, and you would define a collection by a set of IDs. This can be done very effectively with our mySQL database. I am doing this now in the way that I assign a unique id to each of my recording sessions and I save this information in a central .m file, Analysis done across sessions are referred to the set of IDs. I suggest that we take this a step further and have a unique identifier for all our recording sessions (independent on where it is stored). This is already the case for the sessions on the NYU share dataset folder, which are in the database so what I suggest is to expand this to also describe the sessions at other locations. In this setup we would still save analysis results locally.
2018-05-19 17:00 GMT-04:00 Dan Levenstein notifications@github.com:
I think having the results of analyses saved with the files is also useful, especially for results that are more complicated than a simple property of the cell, because I can save it with necessary analysis metadata. I'm sure there are other ways to do the same thing, some of which are probably better. At the end of the day, this is a tool I need now, so I'm making it the way I know how.
Definitely interested to hear about ways we could use database organization for compiling/saving analysis results, and how I might be able to improve my analysis pipeline (but I won't be here for lab meeting this week).
On May 19, 2018, at 4:21 PM, Peter Petersen < notifications@github.com> wrote:
I think that this should really be done with a database perspective in mind and not based an path searches. I have some ideas about to how we can implement it. I suggest that we have a discussion on database and related things next lab-meeting.
2018-05-19 15:59 GMT-04:00 Dan Levenstein < notifications@github.com :
Output:
the default is to load as a (1 x N) structure array: cellinfo(1) is the cellinfo structure from the first recording cellinfo(2) is the cellinfo structure from the second recording ... cellinfo(N) is the cellinfo structure from the Nth recording.
there's an option ('catcall',true), which will try to concatenate all into a single structure. cellinfo.UID will have the UIDs from all the cells loaded cellinfo.baseName will have the baseName from which they were each pulled cellinfo.arbitrarydataname will have whatever metric is in the cellinfo file........
this will probably fail a bit if you have complicated cellinfo files, but for simple things like, for example NREM rate, etc etc, this will be super straightforward/useful.
I'll push it all soon, would like to have dev merged into master first so it's not part of that big dump....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198# issuecomment-390428871, or mute the thread https://github.com/notifications/unsubscribe-auth/ ANb8uWVeBDbBQ7Y5t1nSWArRcuvmEIb0ks5t0HmUgaJpZM4UFHm9 .
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390432340 , or mute the thread <
https://github.com/notifications/unsubscribe-auth/ ANb8uZWDLmvwZ9rmbbQN32IEJIP9gz0Gks5t0If0gaJpZM4UFHm9
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub <https://github.com/buzsakilab/buzcode/issues/198# issuecomment-390515693 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADXrTbLZ5W8TxPN2_ jf2Qxuyg5ZCZSxkks5t0eV5gaJpZM4UFHm9
.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub <https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390590897 , or mute the thread https://github.com/notifications/unsubscribe-auth/ANb8uZ6cjL_ D6FWRVqvDh9w1UrWSZW-gks5t0n05gaJpZM4UFHm9 .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/buzsakilab/buzcode/issues/198#issuecomment-390632695, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTSvhqZjRZ3WJRmsVPUSdQuFFsyMUks5t0qpFgaJpZM4UFHm9 .
I'm adding functionality to bz_LoadCellinfo (which can then be copied to the other I/O functions) to load from an entire dataset of basePaths
Why? I calculated ISI statistics (function for this coming soon) for cells from a bunch of recordings, and saved as baseName.ISIStats.cellinfo.mat in each basePath... I would now like to load those from ALL basePaths in the dataset.
My question for y'all is: how would we like this to be output.
1) Should it just return a 1/N structure array where: ISIStats(1) is the cellinfo structure from the first recording ISIStats(2) is the cellinfo structure from the second recording ... ISIStats(N) is the cellinfo structure from the Nth recording. These are a little unwieldy, and will be need to be combined post-hoc for easy comparison.
Or 2) should it be a single structure where each of the fields are concatenated? This is a little harder to code up and will inevitably have some bugs to iron out relating to alignment etc. But will be much easier to analyze.
or 3) other
Either way, it will tack on the baseName from which the cellInfo was loaded, so that things don't get lost. I.e. each cell will have a UID (unitID) and baseName, both of which will be needed to uniquely identify that cell. We can make a function to align two multi-basePath structures so that UIDs and baseNames line up.