Clarification regarding image preprocessing

mg515 commented 7 months ago

Hey everyone! First, thanks for the nice plugin, it was pretty easy to set up and use with our custom onnx model.

I wanted to ask about preprocessing pipeline, since it's not mentioned in the documentation, nor there is any options in that regard in the UI. I would assume that at minimum the image gets translated into a [0,1] tensor with [c, h, w] shape by dividing the pixel values with 255 and transposing the dimensions. Is there also an imagenet normalization applied afterwards, i.e. using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]?

It would be nice for someone to comment on this and shed some details on what is currently done to the image before it gets input into the onnx graph. Thanks!

Update: was able to find normalization code myself here: src/deepness/processing/models/recognition.py. Still having a confirmation would be nice, thanks!

bartoszptak commented 7 months ago

Hi! Thanks for your message!

Initially, I would like to clarify one thing - the normalization you found in src/deepness/processing/models/recognition.py was a mistake and we removed it.

Additionally, we have released a new version of the plugin with some improvements today. By default, input images are only normalized from 0-255 to range 0-1. But in the newest version, we added two metaparameters: standardization_mean and standardization_std which enables additional scaling.

Here are some steps of processing:

Based on Tile size [px] and Resolution [cm/px], the map_processor generates image tiles, which, for example are (1, 256, 256, 3) for RGB raster layer and batch_size=1.
Then, some pre-processing steps are performed:
- limit_channels_number - the number of channels is limited for the model's requirements (generally, the number of channels and model channels should be equal without that operation)
- normalize_values_to_01 - input image is converted from np.uint8 to np.float32, and divided by 255.
- standardize_values - image values are standardized (if not otherwise set in the metadata, then mean=0, std=1) using y=(x-mean)/std
- transpose_nhwc_to_nchw - the input array is transposed from channel last to channel first, for example, (1, 256, 256, 3) -> (1, 3, 256, 256)
  1. The input array is forwarded to the inference session and results are post-processed differently regarding model type.
  2. Results are remapped back to the info qgis layer.

To sum up, the best way to add other preprocessing ways is to modify the plugin locally in point 2 or add additional processing layers at the beginning of the ONNX model. TBH, there are so many ways to preprocess data that it is hard to combine them all). Therefore we do not plan to add any new specific steps to the plugin.

bartoszptak commented 7 months ago

And if you have any other questions, feel free to ask!

mg515 commented 7 months ago

And if you have any other questions, feel free to ask!

That would be all regarding the preprocessing steps, thanks for a clear answer.

Additional question popped up in my head though - what is the best way to manage dependencies for qgis plugins. As for our development setup, we have no issues as we create and manage the docker environment ourselves. However, for giving out a model, we are facing certain issues, like missing python installation on a host system, and further missing dependencies (onnxruntime, opencv etc.), which are cumbersome to manage especially on windows or macos machines. Is there a recommendation from your side we can follow? Is there any way to use virtual environments at all via venv or conda?

przemyslaw-aszkowski commented 7 months ago

We changed the way requirements are being installed in version 0.6.1. We hope it will help.

PUTvision / qgis-plugin-deepness

Clarification regarding image preprocessing #135