Find good initial representation of spatial embeddings

yellowcap commented 10 months ago

We had fruitful discussions about how embeddings, and generally agree that we should experiment with hierarchical embeddings for testing novel model architectures with big potential improvements in certain use cases.

Insigts from discussion so far below.

Using XYZ tiles as input to the model

Pro

Well known pattern, many users might have data in this format
Ease of integration with many tools build around web mercator XYZ
Lends itself well for explicit hierarchical embeddings in the model

Con

No public mosaic with 10 bands
Reprojection expensive and “lossy” due to distortion of web mercator
Uneven resolution across globe
Not good at poles
We propose using lat/lon + resolution
User side is still covered: we can “convert” xyz to lat/lon + resolution

Hierarchical encoding

Pro

Plug and play semantics
Modularity of embeddings is attractive for business use
Nested hierarchy of embeddings can be used for regional focus, to pick only necessary embeddings for a region etc.

Con

Maybe model can learn this internally on its own
Other options for hierarchy: political boundaries, continents, climatic zones
Can do experiment with this
Implementation is not “vanilla” so defer experiments to later

brunosan commented 10 months ago

Thanks for this. It's a good summary.

The only bit I'll add is that imposing an explicit spatial scheme allows to the other "trick" that I think can help a lot (the other one is absolute anchors) which is shared-semantics.

I really like the idea that few semantics are truly local, most things are similar at regional, continental or even global levels.n E.g. "forests" are everywhere, even when locally forests look somewhat different. If we impose these shared levels of semantics, I believe we can reap outsized benefits when we scale this up to global models.

The challenge is that when you share part of an embedding with other locations, you might induce a lot of noise, specially those parts that are shared across all mebeddings (global semantics). I believe we can overcome that problem using much smaller values proportional to how much they are shared. This way only global semantics are learned at the global level, where many gradients on the same direction add up.

Another way to explain this. I'll use a random location-size embeddings (zxys for clarity here):

z15-x23-y562 = [ "20 float numbers"] but that tile's grand-grand-parent is: z5-x23-y67 = [ "20 float numbers" ] which is shared with other 4^10 =1,048,576 locations at z15. and that tile's grand-grand parent is z0-x0-y0 = [ "20 float numbers" ] which is shared with all 4^15 tiles at z15

So the Full z15-x23-y562 embedding is the concatenation of the chain:

z15-x23-y562 = [ *[ "20 float numbers"], *[z5-x23-y67], *[z0-x0-y0]]

During back propagation we update the weights divigin the learning rate by e.g. how many tiles are in common

z15-x23-y562 += grandient * learning_rate * [ *[ "20 float numbers"], (*[z5-x23-y67])/4^10, (*[z0-x0-y0])/4^15]

yellowcap commented 5 months ago

We have settled for a simple sine/cosine transformation of plain lat/lon for v0.1 and v0.2

Clay-foundation / model

Find good initial representation of spatial embeddings #12

Using XYZ tiles as input to the model

Pro

Con

Hierarchical encoding

Pro

Con