DC-analysis / dclab

Python library for the post-measurement analysis of real-time deformability cytometry (RT-DC) data sets
https://dclab.readthedocs.io
Other
11 stars 12 forks source link

Introduce upstream features via "basins" #222

Closed paulmueller closed 10 months ago

paulmueller commented 1 year ago

When I run dclab-condense on a dataset, it would be nice if I could still somehow access the image data, if the original file is still around. This will be particular interesting, if I download the condensed data from DCOR, open the data in Shape-Out and there is some information in the file that links back to the dataset on DCOR.

An .rtdc file could have multiple "basins" (upstream locations) from which it could access features. If we have data on S3 (#213), then this could become very efficient.

paulmueller commented 1 year ago

A partial implementation with local, relative or absolute paths is available in dclab 0.51.0

paulmueller commented 1 year ago

In version 0.53.0, there is a new RTDCWriter.store_basin method to write basins to files, and S3-based basins are supported.

paulmueller commented 1 year ago

In version 0.54.0, it is possible to store the features a basin provides in the json string. Also, performance of S3 basins was improved.

paulmueller commented 10 months ago

Since version 0.55.3, the HTTP format is the default basin when accessing data from DCOR.

paulmueller commented 10 months ago

In version 0.56.0, DCOR basins are available. Nested basins are now also supported.

paulmueller commented 10 months ago

With the implementation of DCOR basins, I believe we can close things here on the dclab side. In general, dclab-condense should not write basin data, because file names are subject to change and there is no real use case.