broadinstitute / ml4h

Other
122 stars 22 forks source link

Support multi-dimensional TMAPs in output #384

Open paolodi opened 4 years ago

paolodi commented 4 years ago

What Multi-dimensional TMAPs (e.g., operations on raw data) are currently second-class citizens for execution modes that are expected to produce outputs. For example, inference of segmentation models either produces summary stats of goodness of segmentation (via infer_with_pixels), or PNGs stripped down of the metadata required for interpretation and reuse (via plot_predictions). explore mode bypasses multi-dimensional TMAPs altogether.

Why The capability of easily re-using evaluated TMAPs (via inference or explore) is one of the key features of ML4H, and has already allowed us to perform "extrapolation" tasks where ML is used to infer a learned "rare" feature on an extended dataset (e.g., liver fat from standard MRI, LV mass and HRR from resting ECGs etc.). So far, we have fully supported only scalar features by exchanging CSV files, while ongoing work on segmentation and parameterization would require extensions to more complex multidimensional data.

How Allowing outputs in more sophisticated file formats (e.g., HDF5 as a start) that can handle multidimensional (semi-)structured data. TMAPs contain enough information to interpret the data and guide the storage. As producing multi-dimensional outputs is not always needed (and potentially slow), the behavior should be activated only by optional command line flags.

Acceptance Criteria

paolodi commented 3 years ago

@lucidtronix @ndiamant, this is the structured TMAP issue I mentioned before. Please feel free to comment if you have any ideas or suggestions!

I will for sure ask for your help along the way, especially if we want to let this work with autoencoders...