coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
32 stars 17 forks source link

[TPC-H] Configure temporary directory in `duckdb` for out-of-core processing #1510

Open hendrikmakait opened 6 months ago

hendrikmakait commented 6 months ago

Closes #1509

ntabris commented 6 months ago

This would be a temp directory on the VM? If so, I'd recommend doing something under /scratch so it uses NVMe if present.

hendrikmakait commented 6 months ago

This would be a temp directory on the VM? If so, I'd recommend doing something under /scratch so it uses NVMe if present.

Good point, I'll adjust the code to check if we're running on the cloud. I suppose we should create the temporary directory within the workspace of the Dask worker that runs the coiled.function?

jrbourbeau commented 6 months ago

I'd recommend doing something under /scratch so it uses NVMe if present

TIL, thanks @ntabris. Is there a way I can figure out when NVMe is present? Is it only specific instance types?

ntabris commented 6 months ago

I suppose we should create the temporary directory within the workspace of the Dask worker that runs the coiled.function?

Yeah, that would work—dask worker workspace is already set to be in /scratch.

Is there a way I can figure out when NVMe is present? Is it only specific instance types?

It's specific instance types on AWS, in particular, instance types with d in the family, such as m6id or g4dn (plus a few others, such as g5 which does have NVMe).