Open djkapner opened 4 years ago
What about a separate table for traces rather than storing all these values in an array?
Something like this:
Column | Type | Description |
---|---|---|
id | int | Primary key |
timestamp | float4 | Timestamp of value in trace array |
index | int | Value of index in trace array (?) For sorting/downsampling |
roi_id | int | Foreign key to rois table |
trace | float4 | Value of trace at point |
I don't see a scenario where we'd want to perform SQL-style queries within the trace. I think the trace, simply as a 1D list/array is sufficient. Your proposal would result in a potentially enormous SQL table, adding also 4 bytes per data point (from the timestamp), doubling the size on disk.
I am concerned about size:
Let's say a movie has 1e5
frames and the trace values are stored as 4-byte floats:
Per roi: 400kB
right now, we have 1.6M ROIs in the database. Let's just call it 1M:
total trace storage: 400GB
I am not entirely sure which partition of aibsdc-dev-db1
the postgres databases are stored in, but, only 1 partition is even close, and that would consume most of it.
I think postgres real
is the best choice (worried about cornering ourselves with decimal
). And, given the space concerns, I think we need to make some "which experiments" choices sooner, rather than just run all of them.
Point taken for sure about using the dev db and storage limits. Where I would caution (and this is a problem with the trace h5 files too) is in having essentially unitless data. If everything is just dumped to an array, there's no metadata about how the trace actually lines up with the source video.
On Fri, Apr 24, 2020, 12:12 PM Dan Kapner notifications@github.com wrote:
I don't see a scenario where we'd want to perform SQL-style queries within the trace. I think the trace, simply as a 1D list/array is sufficient. Your proposal would result in a potentially enormous SQL table, adding also 4 bytes per data point, doubling the size on disk. I am concerned about size: Let's say a movie has 1e5 frames and the trace values are stored as 4-byte floats: Per roi: 400kB right now, we have 1.6M ROIs in the database. Let's just call it 1M: total trace storage: 400GB I am not entirely sure which partition of aibsdc-dev-db1 the postgres databases are stored in, but, only 1 partition is even close, and that would consume most of it. I think postgres real is the best choice (worried about cornering ourselves with decimal. And, given the space concerns, I think we need to make some "which experiments" choices sooner, rather than just run all of them.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AllenInstitute/AllenSDK/issues/1521#issuecomment-619193427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIFEJBQBLEDP3RL5T65CTMTROHQANANCNFSM4MQKMREA .
Agree about unitless. I think this is a pill to swallow while we're between pipelines. We can include a validation check during transform_pipeline downsampling that the source video and trace have the same size of the first dimension. That would make sure things aren't out-of-sync and also indicate in the code that they are supposed to line up.
The segmentation labeling app needs traces. Originally, the design was for trace extraction to occur during the pre-labeling transformations, which have as input the postgres
rois
andsegmentation_runs
table (these link to the source movie.) That transformation is not implemented, so we decided a faster route is to:rois
table. This should betrace
and should be anArray
of reals, or numeric/decimal. (Size is a concern). Update the existing table with a new column, allowing nulls, because Kat is still using some of that data.ophys_segmentation
repo to get the trace from Suite2P. I think the trace we should use is justF
, as opposed todF
orF - Fneu
. I think this is not an ideal, demixed trace, but it is very clear what it is. Every time point should be stored, so the trace can be downsampled according to the same downsample strategy applied to the video during transformation. WriteF
to the new trace column per-roi.segmentation_run_id=950
which has a truncated video source and can be used for fast testing.segmentation_labeling_app
rois class to read and store the trace.segmentation_labeling_app
transform_pipeline to downsample the trace with the same strategy chosen to downsample movie (perhaps modify downsample movie function to handle the trace array)transform_pipeline
to output the downsampled trace to the manifest.