Open Smith42 opened 6 months ago
I have some code that can extract embeddings from AstroPT now, and equivalent code is available in AstroCLIP. Next to do is get the code ready to ingest gal image/spectrum pairs from here, eta ~few hours π
Some thoughts about embedding space metrics:
FYI to download Francois' AstroCLIP dataset I use the following code:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="EiffL/AstroCLIP",
repo_type="dataset",
local_dir="./",
local_dir_use_symlinks=False,
cache_dir="/raid/data/cache",
)
Then the parquets can be loaded with pandas
Some foundation model benchmark papers that are knocking about in Earth Observation for inspo:
Benchmarking embedding alignment with GPTs π€ and CLIPs π
Contacts: Mike (S), Marc Participants:
Goals and deliverable
Resources needed
We'd probably need a fair amount of GPUs for pretraining π, plus some enthusiastic people to help get this working
Rough checklist