Open bruesba opened 5 years ago
Hi @bruesba,
Thanks for your well-documented comments, we'll have a look at why performance is poor with the pre-trained model and will clarify the documentation.
Is that image from the SpaceNet Atlanta dataset? If not, then normalization of the imagery likely needs to be adjusted for the model to work well (highlighting that we also need to document how pre-trained models in Solaris expect images to be normalized!)
Thanks for using Solaris, we'll try to address your concerns for the next release.
That was a quick response! The input image I enclosed is not from the Atlanta dataset. I will look into image normalisation. Thank you for the suggestion!
I should also add that I set p
to 0 under inference_augmentation --> DropChannel
on line 81 of xdxd_spacenet4.yml, because otherwise an error was thrown about a channel that could not be dropped because it is out of reach. My input data has shape 480, 480, 3.
Hi @bruesba,
That makes sense re: the DropChannel
error - XD_XD's model was trained on images that were originally RGB+near-IR with the near-IR band dropped to produce a 3-channel image. Those images were then Z-scored (zero-mean and unit variance) across the entire dataset (not within each batch).
We'll make sure to clarify how images were processed for model training to enable inference on new data more effectively. As I think more about the approach we used, I'm realizing that it may not have been ideal for enabling inference on new imagery - I'll likely end up re-training the models that accompany solaris
for a later release, so you may see some changes over time.
Hi Nick,
Thanks for your communication and transparency. I'd like to ask a few more questions regarding inference which I hope you have the time to answer. It is my aim to conduct building detection by applying XDXD's model to my own data (with roughly the same properties as the original dataset) through Solaris (and eventually fine tune/retrain the model to detect specific roof types). I adjusted the normalisation configuration like you suggested and was able to produce the image below from the RGB image shown in my original comment.
It makes infinitely more sense than the original greyscale image that was produced before adjusting the normalisation to my own dataset's means and stds, but I'm still not sure how to interpret it. Lower pixel values indicate higher probabilities of buildings, right? But isn't the output supposed to be binary, as mentioned in https://medium.com/the-downlinq/the-spacenet-challenge-off-nadir-buildings-introducing-the-winners-b60f2b700266 ? I attempted to derive a binary mask/polygons from my predicted array above, but both functions require bg_threshold in order to be meaningful, which I have no idea how to derive. How does one go about choosing that value? My overarching question is: How are building footprints derived from Interpreter output? I understand the Python API tutorials will be expanded with the next release, but hope you have the time to point me in the right direction here.
Best regards, Blue
Hi @bruesba,
Agreed that looks a lot better. If you haven't swapped the blue and red channels in your image, you could try that and see if it helps even more - the imagery the model was trained on is BGR channel order, which is common in raw satellite imagery but some datasets don't follow that format.
Higher values indicate higher probability of being a building. There's a known bug in the current release that converts the 0-1 range p(building) to a quasi-z-scored range (#216), which unfortunately makes it hard to interpret the actual numbers - we plan to fix that in the next release. For the moment, you may need to play around with the threshold until you find a value that works. Binarizing the numpy
array with different thresholds and then visualizing with matplotlib
should help.
Thanks for using solaris and for your well-documented issues!
Thank you for your work on this project, it looks extremely promising. Though to a non-expert like myself, several basic Solaris functions remain unclear. The CLI inference tutorial looks simple enough, though running the line specified in https://solaris.readthedocs.io/en/latest/tutorials/notebooks/cli_ml_pipeline.html:
$ solaris_run_ml [path_to_your_config]
in the command prompt returns a CommandNotFoundExeption. Is there a specific shell in which it should be run? Or is there a specific directory from where the commands work?The lines of code in https://medium.com/the-downlinq/announcing-solaris-an-open-source-python-library-for-analyzing-overhead-imagery-with-machine-48c1489c29f7:
import solaris as sol config = sol.utils.config.parse('/path/to/config/file.yml') inferer = sol.nets.infer.Inferer(config) inference_data = sol.nets.infer.get_infer_df(config) inferer(inference_data)
(which describe the inference function for Python) does produce an output, but one that is difficult to interpret and vastly different than the images in the source article:My input.
My output using XDXD's model.
Are there any other functions that ought to be used? The article and documentation seem to suggest that the Inferer class directly leads to object detection or even segmentation (although the Python API tutorial for inference has not been published yet).
You must be busy with high priority issues and the project has launched very recently, but I'd nonetheless like to request clearer instructions for using Solaris for inference. I would like to use the repository for building detection, but can't get it working with the current documentation.
Best