Azure / spark-cdm-connector

MIT License
75 stars 32 forks source link

TextParsingException for large columns. Unable to set univocity parser settings. #77

Closed drinkingbird closed 2 years ago

drinkingbird commented 3 years ago

Attempting to load from a CDM entity with embedded emails with embedded images. Trying to set the option .option("maxCharsPerCol",-1) or .option("maxCharsPerColumn",-1) or any large numeric setting such as .option("maxCharsPerCol",10000000) still fails with: com.univocity.parsers.common.TextParsingException: Length of parsed input (500001) exceeds the maximum number of characters defined in your parser settings (500000) Please expose the option to update the parser settings.

bissont commented 3 years ago

Thanks for find the issue. This issue is now fixed and will be included in the next release.

sergeik66 commented 2 years ago

@bissont Do you have a date for the next release? Thank you.

srichetar commented 2 years ago

This issue is now fixed. Please use the latest release

drinkingbird commented 2 years ago

Hi @srichetar, Was there meant to be a release yesterday? The current release is still linked to the March release. Are you suggesting I rebuild the library from the master branch?

bissont commented 2 years ago

We have pushed the source code for spark3 to this repository (in the master branch). You can now build the uber jar yourself. However, if you are using databricks, only app-registration authentication works (working with databricks on the issue) and this issue in this "issue" is resolved. For synapse, the jars are already in the vhd by default.

Assuming you are using Databricks, you can test it out wit the jar under artifacts: https://github.com/Azure/spark-cdm-connector/blob/master/artifacts/spark-cdm-connector-spark3-assembly-databricks-cred-passthrough-not-working-1.19.2.jar

We haven't released the jar officially because of the issue with credential passthrough.

drinkingbird commented 2 years ago

Excellent. Thanks for the update @bissont