esgf2-us / esg_fastapi

0 stars 0 forks source link

Transition API design for Phase II #11

Open lliming opened 2 months ago

lliming commented 2 months ago

In Phase II, the consolidated index will NOT include file-level entries in the Globus Search index. Clients will not need to inspect additional index entries to learn about individual files in a dataset. Instead, each dataset entry will contain a new file manifest field: a file list including pathnames and checksums for each file in the dataset. Clients can use that manifest to request individual files via HTTP/S.

How will the transition API manage this change? Will the transition API mimic file-level entries when they're removed from the Globus Search indices? Or...?

bstrdsmkr commented 2 months ago

@lliming what's the driver for not including file-level entries? I think this has significant downsides that will be hard to overcome -- for example if a dataset has 4 million files then doesn't that yield a length 4m array in the single dataset document? That's a pretty big transfer if you just want some metadata about the dataset. Or am I misunderstanding?