Open vchemla opened 3 months ago
Hi,
In our case, we would like to read a big CSV file compressed in .gz format.
We would like to use the read_csv function like this:
read_csv
ctx.read_csv('myfile.csv.gz',file_extension=".csv.gz", delimiter=';', has_header=True, schema_infer_max_records=0, file_compression_type='gzip')
However, this decompression is not parallel like pigz (54 seconds) compared to 800 seconds when using the read_csv function.
If you could take a look...
Hi,
In our case, we would like to read a big CSV file compressed in .gz format.
We would like to use the
read_csv
function like this:ctx.read_csv('myfile.csv.gz',file_extension=".csv.gz", delimiter=';', has_header=True, schema_infer_max_records=0, file_compression_type='gzip')
However, this decompression is not parallel like pigz (54 seconds) compared to 800 seconds when using the
read_csv
function.If you could take a look...