CosmiQ / solaris

CosmiQ Works Geospatial Machine Learning Analysis Toolkit
https://solaris.readthedocs.io
Apache License 2.0
414 stars 112 forks source link

[FEATURE]: Inference documentation #212

Open bruesba opened 5 years ago

bruesba commented 5 years ago

Thank you for your work on this project, it looks extremely promising. Though to a non-expert like myself, several basic Solaris functions remain unclear. The CLI inference tutorial looks simple enough, though running the line specified in https://solaris.readthedocs.io/en/latest/tutorials/notebooks/cli_ml_pipeline.html: $ solaris_run_ml [path_to_your_config] in the command prompt returns a CommandNotFoundExeption. Is there a specific shell in which it should be run? Or is there a specific directory from where the commands work?

The lines of code in https://medium.com/the-downlinq/announcing-solaris-an-open-source-python-library-for-analyzing-overhead-imagery-with-machine-48c1489c29f7: import solaris as sol config = sol.utils.config.parse('/path/to/config/file.yml') inferer = sol.nets.infer.Inferer(config) inference_data = sol.nets.infer.get_infer_df(config) inferer(inference_data) (which describe the inference function for Python) does produce an output, but one that is difficult to interpret and vastly different than the images in the source article:

My input. 64362 999,432303 857_64302 999,432243 857,64422 999,432363 857_2018_RGB

My output using XDXD's model. 64362 999,432303 857_64302 999,432243 857,64422 999,432363 857_2018_RGB

Are there any other functions that ought to be used? The article and documentation seem to suggest that the Inferer class directly leads to object detection or even segmentation (although the Python API tutorial for inference has not been published yet).

You must be busy with high priority issues and the project has launched very recently, but I'd nonetheless like to request clearer instructions for using Solaris for inference. I would like to use the repository for building detection, but can't get it working with the current documentation.

Best

nrweir commented 5 years ago

Hi @bruesba,

Thanks for your well-documented comments, we'll have a look at why performance is poor with the pre-trained model and will clarify the documentation.

Is that image from the SpaceNet Atlanta dataset? If not, then normalization of the imagery likely needs to be adjusted for the model to work well (highlighting that we also need to document how pre-trained models in Solaris expect images to be normalized!)

Thanks for using Solaris, we'll try to address your concerns for the next release.

bruesba commented 5 years ago

That was a quick response! The input image I enclosed is not from the Atlanta dataset. I will look into image normalisation. Thank you for the suggestion!

bruesba commented 5 years ago

I should also add that I set p to 0 under inference_augmentation --> DropChannel on line 81 of xdxd_spacenet4.yml, because otherwise an error was thrown about a channel that could not be dropped because it is out of reach. My input data has shape 480, 480, 3.

nrweir commented 5 years ago

Hi @bruesba,

That makes sense re: the DropChannel error - XD_XD's model was trained on images that were originally RGB+near-IR with the near-IR band dropped to produce a 3-channel image. Those images were then Z-scored (zero-mean and unit variance) across the entire dataset (not within each batch).

We'll make sure to clarify how images were processed for model training to enable inference on new data more effectively. As I think more about the approach we used, I'm realizing that it may not have been ideal for enabling inference on new imagery - I'll likely end up re-training the models that accompany solaris for a later release, so you may see some changes over time.

bruesba commented 5 years ago

Hi Nick,

Thanks for your communication and transparency. I'd like to ask a few more questions regarding inference which I hope you have the time to answer. It is my aim to conduct building detection by applying XDXD's model to my own data (with roughly the same properties as the original dataset) through Solaris (and eventually fine tune/retrain the model to detect specific roof types). I adjusted the normalisation configuration like you suggested and was able to produce the image below from the RGB image shown in my original comment.

passendemeanst_maxpix255_64362 999,432303 857_64302 999,432243 857,64422 999,432363 857_2018_RGB

It makes infinitely more sense than the original greyscale image that was produced before adjusting the normalisation to my own dataset's means and stds, but I'm still not sure how to interpret it. Lower pixel values indicate higher probabilities of buildings, right? But isn't the output supposed to be binary, as mentioned in https://medium.com/the-downlinq/the-spacenet-challenge-off-nadir-buildings-introducing-the-winners-b60f2b700266 ? I attempted to derive a binary mask/polygons from my predicted array above, but both functions require bg_threshold in order to be meaningful, which I have no idea how to derive. How does one go about choosing that value? My overarching question is: How are building footprints derived from Interpreter output? I understand the Python API tutorials will be expanded with the next release, but hope you have the time to point me in the right direction here.

Best regards, Blue

nrweir commented 5 years ago

Hi @bruesba,

Agreed that looks a lot better. If you haven't swapped the blue and red channels in your image, you could try that and see if it helps even more - the imagery the model was trained on is BGR channel order, which is common in raw satellite imagery but some datasets don't follow that format.

Higher values indicate higher probability of being a building. There's a known bug in the current release that converts the 0-1 range p(building) to a quasi-z-scored range (#216), which unfortunately makes it hard to interpret the actual numbers - we plan to fix that in the next release. For the moment, you may need to play around with the threshold until you find a value that works. Binarizing the numpy array with different thresholds and then visualizing with matplotlib should help.

Thanks for using solaris and for your well-documented issues!