CellProfiler / CellProfiler-Analyst

Open-source software for exploring and analyzing large, high-dimensional image-derived data.
http://cellprofileranalyst.org
Other
145 stars 72 forks source link

Expand well_id data model for Plate Viewer #95

Open dlogan opened 8 years ago

dlogan commented 8 years ago

(I thought there was an issue for this, but I couldn't find it. Sorry if a duplicate) Wells are assumed to be named as 'A01' or 'A1', i.e. [letter][number]*. At least I think so, since other well naming schemes like [number][letter] result in an error. I have come across other examples that I would like to use, like [number][letter], [number][number](new Phenix machine at Broad, I think uses this?), or just a single number in those cases in which the data is not in wells but in some other format.

It would be helpful to expand the well_id data model to be something more generalizable like well_row, well_col, or simply X/Y so that Plate Viewer could be used without needing to alter the database metadata columns manually.

braymp commented 8 years ago

Kinda/sorta related to https://github.com/CellProfiler/CellProfiler/issues/1677.

dlogan commented 8 years ago

Fixed by https://github.com/CellProfiler/CellProfiler/issues/1824

bethac07 commented 8 years ago

I still think this is valuable, for both a) allowing compatibility with older versions of CP b) more flexibility in general.

braymp commented 8 years ago

Big :+1: from me. I'm currently having to hack in the well ID manually, since the analysis tools I'm using index the well row and column as integers.

jhung0 commented 8 years ago

Ok... what would be the options in properties that you want to see? There is an option in the code to set well_id = 123...does that work?

braymp commented 8 years ago

"well_id = 123" simply assumes that the wells are indexed in sequential order without regard to rows or columns; this was invented to visualize some microarray data, IIRC.

I think a syntax like "well_id = Image_Metadata_WellRow,Image_Metadata_WellColumn" would suffice. I would actually take it further and allow for both row/col values to be integers, not just the WellColumn; the latter is the typical case for the 'A01' format, but not all scopes produce files with that nomenclature.

If you go this route, then if there's only one table col specified, e.g., "Well = Image_Metadata_Well" then you can assume the row/cols are concatenated, as it is now.

jhung0 commented 8 years ago

Ok could you give examples for how the actual properties file would look? Currently it's well_format = A01 well_id = well or well_format = 123 well_id = well

On Sat, Jul 16, 2016 at 7:10 PM, Mark Bray notifications@github.com wrote:

"well_id = 123" simply assumes that the wells are indexed in sequential order without regard to rows or columns; this was invented to visualize some microarray data, IIRC.

I think a syntax like "well_id = Image_Metadata_WellRow,Image_Metadata_WellColumn" would suffice. I would actually take it further and allow for both row/col values to be integers, not just the WellColumn; the latter is the typical case for the 'A01' format, but not all scopes produce files with that nomenclature.

If you go this route, then if there's only one table col specified, e.g., "Well = Image_Metadata_Well" then you can assume the row/cols are concatenated, as it is now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CellProfiler/CellProfiler-Analyst/issues/95#issuecomment-233125495, or mute the thread https://github.com/notifications/unsubscribe-auth/AJJbgnR7iqUCT2z4jcR6yCvJegxg_UH3ks5qWLwggaJpZM4Gz0i1 .

bethac07 commented 8 years ago

What about something like well_format = RowCol well_id = Image_Metadata_WellRow,Image_Metadata_WellColumn

jhung0 commented 8 years ago

What I'm not understanding is this Image metadata wellrow etc. You want to write those strings into the properties file or are they supposed to stand for something else? I feel like I'm missing something...

On Jul 18, 2016 20:48, "bethac07" notifications@github.com wrote:

What about something like well_format = RowCol well_id = Image_Metadata_WellRow,Image_Metadata_WellColumn

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CellProfiler/CellProfiler-Analyst/issues/95#issuecomment-233318613, or mute the thread https://github.com/notifications/unsubscribe-auth/AJJbgiRiQnpCNxjAfF6tkWp0Y9ZPurKTks5qW3YEgaJpZM4Gz0i1 .

braymp commented 8 years ago

I think well_id having one or two fields would be enough. I.e, if

well_id = <col1>

then the full well ID is specified in the single column. But if it's:

well_id = <col1>,<col2>

then it's separated.

I would change the well_format in a similar way. So for example,

well_id = <col1>
well_format = A01

assumes specifies the full well in the single-character/two-digit format (current behavior, default), whereas

well_id = <col1>,<col2>
well_format = A01

would be separated row and col metadata, with col1 being the single character, and col2 being the digits (possible left-padded with 0's), and

well_id = <col1>,<col2>
well_format = 123

would be separated row and col metadata, with both col1 and col2 being digits. The last case

well_id = <col1>
well_format = 123

would behave as before: specifies the full well as a single integer.

braymp commented 8 years ago
What I'm not understanding is this Image metadata wellrow etc.
You want to write those strings into the properties file or are they
supposed to stand for something else? I feel like I'm missing something...

At this point, ExportToDatabase is not set up to handle writing multiple entries written to this particular field, so the user would need to do so by hand (which I think is fine). If the user does so, we need CPA to interpret this format, and then do the right thing.

jhung0 commented 8 years ago

Do you have an example? There are some database commands, and I'm not sure how 2 well ids would be handled...Sorry I really know very little about handling plates.

bethac07 commented 8 years ago

Poking around in the source code it seems like in many places it'd be a pain to change the well_id (you'd have change all the SQL call functions, etc). Given you'll always have Metadata_Well when you create Metadata_WellRow and Metadata_WellColumn (CP automatically generates it), maybe it's a better idea to leave well_id alone and then create a separate property called well_rowcol_names or something that's only used by PlateMapViewer. Something like this:

well_id       = Image_Metadata_Well
well_rowcol_names =Image_Metadata_WellRow, Image_Metadata_WellColumn

then in the platemappanel.py init function check to see if well_rowcol_names is set; if so use those for self.row_labels and self.col_labels, otherwise parse as currently.

Does this a) make sense and b) solve everyone's needs? I don't trust my coding skills enough to try to implement it myself but it seems like these are the pieces you'd need.