Open lliming opened 2 months ago
@lliming what's the driver for not including file-level entries? I think this has significant downsides that will be hard to overcome -- for example if a dataset has 4 million files then doesn't that yield a length 4m array in the single dataset document? That's a pretty big transfer if you just want some metadata about the dataset. Or am I misunderstanding?
In Phase II, the consolidated index will NOT include file-level entries in the Globus Search index. Clients will not need to inspect additional index entries to learn about individual files in a dataset. Instead, each dataset entry will contain a new file manifest field: a file list including pathnames and checksums for each file in the dataset. Clients can use that manifest to request individual files via HTTP/S.
How will the transition API manage this change? Will the transition API mimic file-level entries when they're removed from the Globus Search indices? Or...?