Add traces to data extracted from Suite2P

djkapner commented 4 years ago

The segmentation labeling app needs traces. Originally, the design was for trace extraction to occur during the pre-labeling transformations, which have as input the postgres rois and segmentation_runs table (these link to the source movie.) That transformation is not implemented, so we decided a faster route is to:

[x] add a column to the rois table. This should be trace and should be an Array of reals, or numeric/decimal. (Size is a concern). Update the existing table with a new column, allowing nulls, because Kat is still using some of that data.
[x] modify the ophys_segmentation repo to get the trace from Suite2P. I think the trace we should use is just F, as opposed to dF or F - Fneu. I think this is not an ideal, demixed trace, but it is very clear what it is. Every time point should be stored, so the trace can be downsampled according to the same downsample strategy applied to the video during transformation. Write F to the new trace column per-roi.
[x] run some subset of the experiments so we have some data on-hand to play with.
[x] updated design doc.
[x] recreate the equivalent of segmentation_run_id=950 which has a truncated video source and can be used for fast testing.
[x] modify segmentation_labeling_app rois class to read and store the trace.
[x] modify segmentation_labeling_app transform_pipeline to downsample the trace with the same strategy chosen to downsample movie (perhaps modify downsample movie function to handle the trace array)
[x] modify transform_pipeline to output the downsampled trace to the manifest.

kschelonka commented 4 years ago

What about a separate table for traces rather than storing all these values in an array?

Something like this:

Column	Type	Description
id	int	Primary key
timestamp	float4	Timestamp of value in trace array
index	int	Value of index in trace array (?) For sorting/downsampling
roi_id	int	Foreign key to rois table
trace	float4	Value of trace at point

djkapner commented 4 years ago

I don't see a scenario where we'd want to perform SQL-style queries within the trace. I think the trace, simply as a 1D list/array is sufficient. Your proposal would result in a potentially enormous SQL table, adding also 4 bytes per data point (from the timestamp), doubling the size on disk. I am concerned about size: Let's say a movie has 1e5 frames and the trace values are stored as 4-byte floats: Per roi: 400kB right now, we have 1.6M ROIs in the database. Let's just call it 1M: total trace storage: 400GB I am not entirely sure which partition of aibsdc-dev-db1 the postgres databases are stored in, but, only 1 partition is even close, and that would consume most of it. I think postgres real is the best choice (worried about cornering ourselves with decimal). And, given the space concerns, I think we need to make some "which experiments" choices sooner, rather than just run all of them.

kschelonka commented 4 years ago

Point taken for sure about using the dev db and storage limits. Where I would caution (and this is a problem with the trace h5 files too) is in having essentially unitless data. If everything is just dumped to an array, there's no metadata about how the trace actually lines up with the source video.

On Fri, Apr 24, 2020, 12:12 PM Dan Kapner notifications@github.com wrote:

I don't see a scenario where we'd want to perform SQL-style queries within the trace. I think the trace, simply as a 1D list/array is sufficient. Your proposal would result in a potentially enormous SQL table, adding also 4 bytes per data point, doubling the size on disk. I am concerned about size: Let's say a movie has 1e5 frames and the trace values are stored as 4-byte floats: Per roi: 400kB right now, we have 1.6M ROIs in the database. Let's just call it 1M: total trace storage: 400GB I am not entirely sure which partition of aibsdc-dev-db1 the postgres databases are stored in, but, only 1 partition is even close, and that would consume most of it. I think postgres real is the best choice (worried about cornering ourselves with decimal. And, given the space concerns, I think we need to make some "which experiments" choices sooner, rather than just run all of them.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AllenInstitute/AllenSDK/issues/1521#issuecomment-619193427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFEJBQBLEDP3RL5T65CTMTROHQANANCNFSM4MQKMREA .

djkapner commented 4 years ago

Agree about unitless. I think this is a pill to swallow while we're between pipelines. We can include a validation check during transform_pipeline downsampling that the source video and trace have the same size of the first dimension. That would make sure things aren't out-of-sync and also indicate in the code that they are supposed to line up.

AllenInstitute / segmentation-labeling-app

Add traces to data extracted from Suite2P #56