The VERBATIM_TO_IDENTIFIER stage is running everything on Spark (Yarn), even for tiny datasets such as this one.
We should either fix the config to be something reasonable (e.g. 1M records or >1GB uncompressed size or so) or rework this stage so that it doesn't require distributed computing.
The
VERBATIM_TO_IDENTIFIER
stage is running everything on Spark (Yarn), even for tiny datasets such as this one.We should either fix the config to be something reasonable (e.g. 1M records or >1GB uncompressed size or so) or rework this stage so that it doesn't require distributed computing.