DC-analysis / dclab

Python library for the post-measurement analysis of real-time deformability cytometry (RT-DC) data sets
https://dclab.readthedocs.io
Other
10 stars 12 forks source link

New issue with reading in inertia ratio features #234

Closed felix-r closed 1 year ago

felix-r commented 1 year ago

Hi, I have a new problem with reading in inertia ratio features from exported rtdc files. The features inert_ratio_raw and inert_ratio_cvx are present in the datasets but are not available as features ds.features. But this is not true for all files. I realized, that the time of file creation has to do with the availability of features after reading them with dclab. Basically for new files it works and for older files there is some problem. I currently use dclab 0.52.0 with python 3.11. Here is some example code to illustrate what I mean (T: = guck_division2):

file1 = r"T:\Members\Felix\analysis_data\RTDC\20221206_Felix_hyper_channel_HL60_LatB\wc60_Lc500\DMSO\M001_data.rtdc"
file2 = r"T:\Members\Felix\analysis_data\RTDC\20230124_Felix_blood_hyper_channels_h30\wc60_lc500\whole_blood\filtered_data\SO2-export_0_whole_blood.rtdc"
file3 = r"T:\Members\Felix\analysis_data\RTDC\20230317_Felix_RBC_hyper_channel_h5um\wc60_lc500\M001_data.rtdc"
file4 = r"T:\Members\Felix\analysis_data\RTDC\20230322_Felix_beads_deformation_outside_hyper\S4\inlet\M001_data.rtdc"
file5 = r"T:\Members\Felix\analysis_data\RTDC\20230406_Felix_HL60_LatB_hyper_channels\100nM_LatB\M001_data.rtdc"

for file in [file1, file2, file3, file4, file5]:
    ds = dclab.new_dataset(file)
    date = ds.config['experiment']['date']
    inert_bool = 'inert_ratio_raw' in ds.features
    print(f"Inertia ratio loaded for {date}: {inert_bool}")
Out:
  Inertia ratio loaded for 2022-12-06: False
  Inertia ratio loaded for 2023-01-24: False
  Inertia ratio loaded for 2023-03-17: False
  Inertia ratio loaded for 2023-03-22: True
  Inertia ratio loaded for 2023-04-06: True

In the original files, the features are available (Y: = HSM):

file1 = r"Y:\Data\RTDC\20221206_Felix_hyper_channel_HL60_LatB\wc60_Lc500\DMSO\M001_data.rtdc"
file2 = r"Y:\Data\RTDC\20230124_Felix_blood_hyper_channels_h30\wc60_lc500\whole_blood\M001_data.rtdc"
file3 = r"Y:\Data\RTDC\20230317_Felix_RBC_hyper_channel_h5um\wc60_lc500\M001_data.rtdc"
file4 = r"Y:\Data\RTDC\20230322_Felix_beads_deformation_outside_hyper\S4\inlet\M001_data.rtdc"
file5 = r"Y:\Data\RTDC\20230406_Felix_HL60_LatB_hyper_channels\100nM_LatB\M001_data.rtdc"

for file in [file1, file2, file3, file4, file5]:
    ds = dclab.new_dataset(file)
    date = ds.config['experiment']['date']
    inert_bool = 'inert_ratio_raw' in ds.features
    print(f"Inertia ratio loaded for {date}: {inert_bool}")
Out:
  Inertia ratio loaded for 2022-12-06: True
  Inertia ratio loaded for 2023-01-24: True
  Inertia ratio loaded for 2023-03-17: True
  Inertia ratio loaded for 2023-03-22: True
  Inertia ratio loaded for 2023-04-06: True

So I guess, it is connected to how these files were exported. I cannot recall which versions of dclab I used to export these files. The export function I use looks like this:

export_features = list(set(tr_ds.dataset.features_innate)
                                       & set(tr_ds.dataset.features_scalar))
if 'inert_ratio_raw' not in export_features:
    export_features += ['inert_ratio_raw', 'inert_ratio_cvx']
dataset.export.hdf5(path=path, features=export_features, 
                    filtered=True,
                    override=True,
                    compression=None,
                    skip_checks=True)

Also, all the other features are there. It only seems to affect inertia ratio. Any idea how to fix this?

felix-r commented 1 year ago

I would also like to point out that it is quite critical for me to fix this quickly because I need to re-analyze a lot of old experiments right now before I can go on with new stuff!

paulmueller commented 1 year ago

The feature data are in the HDF5 file, but dclab is not loading them, because these features are identified as defective. From the dates you pasted, it seems like this is related to the inertia ratio check methods implemented in dclab 0.48.4. So the reason you are experiencing this is the fix for #212.

I would say you have three options now:

  1. If you are 100% certain that your exported inertia ratio values are correct (#212 does not apply), you can bypass the defective feature check by setting a dclab version of 0.48.4 in the "software version" metadata key.
  2. Recompute the inertia ratio from the original data and copy it to the new files.
  3. Downgrade dclab to 0.48.2 (not recommended)
felix-r commented 1 year ago

Ok, thanks. I'll try version 1 because version 2 is what I am trying to avoid. Also, I don't want to downgrade dclab.

felix-r commented 1 year ago

Version 1 worked 🎉

For anyone encountering this issue in the future, this code works to overwrite the "software version" value in the metadata of the rtdc-file:

import h5py
with h5py.File("path\\to\\file.rtdc", 'r+') as h5:
    # '2.3.4.54' was the ShapeIn version used in my case. Should be changed accordingly
    h5.attrs.modify("setup:software version", "2.3.4.54 | dclab 0.48.4")
paulmueller commented 1 year ago

OK great, for anyone encountering this issue not understanding what #212 means, please do not apply this code blindly. Your data might be wrong. -> closing