Closed cfangplus closed 2 years ago
Provide more details and my guess as follow, need to check. Indeed, the additional stream time metric that was mentioned above comes from GPURowToColumnar node. Although CACHE TABLE persist the table data within dram, it's data format is still row-oriented and in fact, as columnar format is more suitable for GPU processing, So Rapids need operation to convert row to column, that's what GPURowToColumnar does. While parquet is one kind of columnar storage format, so the former case need not to convert row to column operation, that's to say, the former case need not GPURowToColumnar. So the former case is faster and would have a better performance. Right ?
Hi @cfangplus Not sure if you have tried PCBS?
Could you share the Physical plan for the GPU runs before and after the "cache"?
yea, after I use PCBS, the physical plan indeed take some difference with that without PCBS. The GpuRowToColumnar node after Scan In-memory table disappeared. thx @viadea
hi,
I run a SQL which contains four stages, and the 1st stage aims to scan the parquet files and prepare shuffle write data for the next stage and the mean time of tasks is about 4s. To reduce the scan time from disk to GPU, before run the SQL I used CACHE Table to cache the data. So now the 1st stage is supposed to transfer data from DRAM to GPU which I thought could be faster. However, it's not. After I compare the running details from SQL TAB , I found that there comes a stream time metric and it amounts to be 3.6s which contributes the mean task to be 6s. Why? I think that's unbelievable, does the former one do not contain stream time ?