-
ParquetOutputFormat should support custom OutputCommitter.
There is a need to bypass current Hadoop functionality of writing output data under **_temporary** folder. Especially with AWS S3, there can…
-
When using Tez as Hive execution engine, INSERT statement will return successful but no rows are inserted. This works under MapReduce engine.
Tez somehow does not seem to call OutputCommitter at a…
-
During writing a dask dataframe to an output format all files are created eagerly and run as soon as they can. This can cause incomplete datasets to be written to the underlying filesystem in case s…
-
We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet files. When our ETL job finishes, it throws this exception, and the status is "died in job commit".
2015-05-14 09:24:28,158 F…
-
### Apache Iceberg version
1.3.1
### Query engine
Hive
### Please describe the bug 🐞
*version: hive-3.1.3 iceberg-1.3.1 kerberos-1.15.1 hadoop-3.3.6
user: hadoop
*hive.server2.transport.mode: b…
-
Hi Ali,
I am using sqoop to export data from Maxcompute to Postgres.
```
./odps-sqoop/bin/sqoop export --connect jdbc:postgresql://localhost:5432/replication_db --table dim_wmp_cabinet \
--use…
-
Right now, Dask uses the various implementations of fsspec for writing. Since each task writes one file, the creation of any chunks on a remote server and finalising of the file happen serially in a s…
-
We have a Cloudera CDH4 cluster and run into a compatibility issue with Faunus, CDH4 is based on Hadoop 2.x instead of Hadoop 1.x.
The Mapper.Context constructor signature changed and causes a NoSuch…
-
See Esri/gis-tools-for-hadoop#83
-
Hi,
I collected all prerequisites (fsimage, audit log) and prepared local environment (accompanying hdfs, separate yarn manager) according to Dynamometer readme and tried to start workload scripts.…