developmentseed / label-maker

Data Preparation for Satellite Machine Learning
http://devseed.com/label-maker/
MIT License
456 stars 110 forks source link

Handling objects that straddle tiles #31

Open lewfish opened 6 years ago

lewfish commented 6 years ago

Given my understanding of how label-maker works, tile extents are static (and based on the mbtile files), and with ml_type=='object-detection', objects that straddle tile boundaries will be split up in the training data. Ideally, the tile bounds would be generated dynamically so that it could generate tiles that contain the entire object, which should help the model learn better. However, fixing this would probably complicate the implementation (which I think is elegant, btw), and might not be worth it assuming there aren't many objects that straddle tiles, or if you want to be able to detect partial, clipped objects. I was just wondering if this is something you've considered.

drewbo commented 6 years ago

@lewfish thanks for the feedback. We've thought about it a little but don't have any great ideas currently for dealing with it. Ideally we'll gradually move away from the "rigidity" of the current tile approach and into something that supports the above case. I think #13 will involve GeoTIFF reading according to a tile schema but we'll want to eventually relax that to support sliding window approaches/reads and then we'll have to develop a better "non-tile" schema.

Also object-detection is currently the least well documented case (pending some help in #30) so happy for new ideas or approaches here.

jremillard commented 6 years ago

Using the QA tiles for the source data might not be ideal. In my https://github.com/jremillard/images-to-osm project I used overpass to acquire the training data. Overpass works on everything, doesn't cut up the objects, is zoom independent, and its query language is very flexible (but weird). Also, running a ML project on country boundary doesn't have enough flexibility.