[MAINT]: Clarify documentation of expected outputs from Inferer.infer()

bruesba commented 5 years ago

Any idea how the following error can be avoided? I'm running into it while attempting to train the XDXD model. I haven't seen it before but am using .pngs for both the images batch and that of the masks as usual.

(solaris) C:\Users\blue\Documents\solaris>C:/Users/blue/Miniconda3/envs/solaris/python.exe c:/Users/blue/Documents/solaris/xdxd_training.py Beginning training epoch 0 C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\nn\functional.py:2539: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) loss at batch 0: 9.8624849319458 Traceback (most recent call last): File "c:/Users/blue/Documents/solaris/xdxd_training.py", line 5, in trainer.train() File "c:\Users\blue\Documents\solaris\solaris\nets\train.py", line 113, in train for batch_idx, batch in enumerate(self.train_datagen): File "C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\utils\data\dataloader.py", line 560, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\utils\data_utils\collate.py", line 63, in default_collate return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File "C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\utils\data_utils\collate.py", line 63, in return {key: default_collate([d[key] for d in batch]) for key in batch[0]} File "C:\Users\blue\Miniconda3\envs\solaris\lib\site-packages\torch\utils\data_utils\collate.py", line 70, in default_collate raise TypeError((error_msg_fmt.format(type(batch[0])))) TypeError: batch must contain tensors, numbers, dicts or lists; found <class 'imageio.core.util.Array'>

The masks are in 8-bit greyscale (PIL 'L'-mode) since 3-dimensional B/W throws:

ValueError: Target size (torch.Size([1, 3, 320, 320])) must be the same as input size (torch.Size([1, 1, 320, 320]))

nrweir commented 5 years ago

Hi @bruesba ,

Pretty sure this is because we haven't yet implemented using .pngs as data sources. That's something that is on the list of to-dos, but we haven't gotten to it; it looks like it would essentially require transforming the imageio.core.util.Array object into a numpy array or torch tensor. Alternatively (until this is implemented), you should be able to convert your source data to TIFFs and be on your way.

-Nick

bruesba commented 5 years ago

Hey Nick,

You're right, after converting my input data to .tiff, training works as expected. I thought I'd trained on .pngs before but must have mixed something up! Inference from pngs is already possible, isn't it? Thank you for your hard work

nrweir commented 5 years ago

@bruesba,

We haven't explicitly implemented it yet, but if you try it and it works, let me know ;) a lot of the back-end stuff that we use (e.g. scikit-image and opencv) can work with .pngs, it just depends on whether the subtle differences in i/o with different formats cause problems - such as the one you raised here.

-N

bruesba commented 5 years ago

Inferring from a .png using the pre-trained XDXD network does seem to work (i.e. does not throw an error), but the output is not binary as it's probably supposed to be. Could this explain the vague output reported down in the thread of issue #212?

Here are the outputs of the same photo in .tif and in .png (from my own dataset; the .png was in RGB rather than BGR but the point still stands). Is the preds_to_binary operation automatically carried out on .tifs by the Inferer? If so, what bg_threshold is used?

nrweir commented 5 years ago

Good to know that .pngs work.

The preds_to_binary operation doesn't get run automatically; the goal was to leave raw probability (or probability-like) outputs in case users want to do something interesting with the inference outputs (such as combining probability estimates from different models in an ensembling step). Binarization runs during polygonization of outputs (see sol.vector.mask.mask_to_poly_geojson(), which currently defaults to setting the threshold at 0). Alternatively, you can directly binarize outputs with sol.vector.mask.preds_to_binary()

nrweir commented 5 years ago

I'm going to close this because .png support requests are already covered by #184.

bruesba commented 5 years ago

I'm not sure I understand - you're saying binarisation does run automatically in the polygonisation phase of inference, right? Wouldn't a default threshold of 0 then return a white image, i.e. one where every pixel is labelled a building with sufficient certainty? Or is binarisation not automatically carried out when Inferer is called and is a greyscale output expected? I'm confused because the image below is the output I get when I run Inferer on a .tif and I'm not sure at what level of confidence buildings are highlighted. (please forgive the print-screen)

By contrast, a non-binarised image is returned when I attempt to infer from a .png, (which fits your comment about raw probability better if I understand correctly):

64362 999,432303 857_64302 999,432243 857,64422 999,432363 857_2018_RGB

Is this behaviour intended? What threshold is used in the former output? To be clear, my code only encompasses:

pre_trained = r'path/to/config' config = sol.utils.config.parse(pre_trained) inferer = sol.nets.infer.Inferer(config) inference_data = sol.nets.infer.get_infer_df(config) inferer(inference_data)

Best, Blue

nrweir commented 5 years ago

Huh. So Inferer.infer() doesn't binarize at all - that's all done later. It looks to me like the predictions you're getting out of the TIF-formatted image are just better than the predictions you're getting out of the PNG (higher confidence, i.e. bigger difference between background and foreground pixels). You can see a few pixels around some edges in the TIF output that you pasted here that don't look quite as bright - those may in fact be something other than 0 or 1, unless I'm mistaken.

I'm assuming those two outputs were generated from the same trained model? If so, my guess would be that when the PNG gets loaded in, the values in the array produced are different - maybe they're scaled differently - and since the model was trained on TIF-formatted inputs, it doesn't do as well with the PNG inputs. You could check this by loading both the TIF and PNG inputs with sol.utils.io.imread() and checking to see if the values in the array produced are in fact different. That's just a guess though...

Regardless, this is something we need to clarify in the documentation, so thanks for bringing it to our attention.

CosmiQ / solaris

[MAINT]: Clarify documentation of expected outputs from Inferer.infer() #229