databricks-industry-solutions / pixels

Facilitates simple large scale processing of HLS Medical images, documents, zip files. Previously at https://github.com/dmoore247/pixels
https://databricks-industry-solutions.github.io/pixels/
Other
25 stars 15 forks source link

Add transformer to perform image patching #50

Open dmoore247 opened 8 months ago

dmoore247 commented 8 months ago

Is your feature request related to a problem? Please describe. Need a modular, clear, concise method for adding image manipulation code. Come image feature engineering pipelines will take a larger image and slice it up into patches (e.g. 128x128) that will work with deep learning or embeddings.

Describe the solution you'd like See the whole slides image processing accelerator, encapsulate the patching process into a compact transformer.

Describe alternatives you've considered Most user code will focus on looping over every file, read the file, write the patches out. This is problematic because:

  1. It doesn't scale
  2. Lots of extra IO
  3. Doesn't have error handling and strong tracking
  4. A lot of effort is spent by the user on optimizing IO rather on the unique aspects of the Deep Learning / Embedding.

Additional context