Usage question - Githubissues

I am interested in evaluating the text2earth model for text-to-image retrieval and want to compare it to CLIP-based models.

My assumption was that text2earth is a text encoder that encodes text to the same space as the Clay image embeddings. I had assumed that I could do the following:

Use the Clay v1 model to create embeddings for some chips
Find a text2earth model compatible with the v1 model
Use it to embed natural language text queries like "running track", "house with swimming pool" etc.
Compute similarity scores between the text embedding and the chip embeddings

But I am a little confused by the example notebooks (such as this one).

Questions:

Is the workflow described above currently supported?
Is there a text2earth model compatible with the v1 model?

Clay-foundation / earth-text

Usage question #28