MathCancer / PhysiCell

PhysiCell: Scientist end users should use latest release! Developers please fork the development branch and submit PRs to the dev branch. Thanks!
http://PhysiCell.org
134 stars 92 forks source link

Enhancement: MCDS output.xml label carries additional information about the column data type #270

Open elmbeech opened 1 month ago

elmbeech commented 1 month ago

For downstream analysis, it would often be good to know what the actual data type from a column for the output matrix is. In PhysiCell everything is outputted as float (double).

The suggestion is to have an additional tag dtype for data type, which specifies, if the column is supposed to be str (categorical), bool (categorical), int (numerical), float (numerical). This could look somehow like below:

<simplified_data type="matlab" source="PhysiCell" data_version="2">
    <labels> 
        <label index="0" size="1" units="none", dtype="str">ID</label>
        <label index="1" size="3" units="microns", dtype="float">position</label>
        [...]
        <label index="5" size="1" units="none", dtype="str">cell_type</label>
        <label index="6" size="1" units="none", dtype="str">cycle_model</label>
        [...]
        <label index="21" size="1" units="none", dtype="int">number_of_nuclei</label>
        [...]
        <label index="27" size="1" units="none", dtype="bool">dead</label>
        [...]
    </labels>
    <filename>output00000064_cells.mat</filename>
</simplified_data>

Thnak you!

elmbeech commented 1 month ago

Currently these are the non float columns:

integer:

boolean:

string: