apache / hop

Hop Orchestration Platform
https://hop.apache.org/
Apache License 2.0
982 stars 351 forks source link

[Bug]: Execution Information transform fails on LogLevel #2312

Open bamaer opened 1 year ago

bamaer commented 1 year ago

Apache Hop version?

SNAPSHOT-20230211

Java version?

openjdk version "11.0.17" 2022-10-18

Operating system

Linux

What happened?

2023/02/11 14:47:40 - exec-info-read - ERROR: Error handling writing final pipeline state to location (non-fatal)
2023/02/11 14:47:40 - exec-info-read - ERROR: org.apache.hop.core.exception.HopException: 
2023/02/11 14:47:40 - exec-info-read - Error storing execution data
2023/02/11 14:47:40 - exec-info-read - logLevel String : There was a data type error: the data type of org.apache.hop.core.logging.LogLevel object [BASIC] does not correspond to value meta [String] (through reference chain: org.apache.hop.execution.ExecutionData["rowsBinaryGzipBase64Encoded"])
2023/02/11 14:47:40 - exec-info-read - 
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.execution.local.FileExecutionInfoLocation.registerData(FileExecutionInfoLocation.java:256)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.engines.local.LocalPipelineEngine.stopTransformExecutionInfoTimer(LocalPipelineEngine.java:486)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.engines.local.LocalPipelineEngine.pipelineCompleted(LocalPipelineEngine.java:465)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.Pipeline.firePipelineExecutionFinishedListeners(Pipeline.java:1343)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.Pipeline.lambda$startThreads$0(Pipeline.java:1142)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.transform.BaseTransform.fireTransformFinishedListeners(BaseTransform.java:2779)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.transform.BaseTransform.markStop(BaseTransform.java:2768)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:142)
2023/02/11 14:47:40 - exec-info-read -  at java.base/java.lang.Thread.run(Thread.java:829)
2023/02/11 14:47:40 - exec-info-read - Caused by: com.fasterxml.jackson.databind.JsonMappingException: logLevel String : There was a data type error: the data type of org.apache.hop.core.logging.LogLevel object [BASIC] does not correspond to value meta [String] (through reference chain: org.apache.hop.execution.ExecutionData["rowsBinaryGzipBase64Encoded"])
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:316)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:782)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:178)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1572)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ObjectWriter._writeValueAndClose(ObjectWriter.java:1273)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:1098)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.execution.local.FileExecutionInfoLocation.registerData(FileExecutionInfoLocation.java:253)
2023/02/11 14:47:40 - exec-info-read -  ... 8 more
2023/02/11 14:47:40 - exec-info-read - Caused by: java.lang.RuntimeException: logLevel String : There was a data type error: the data type of org.apache.hop.core.logging.LogLevel object [BASIC] does not correspond to value meta [String]
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.core.row.value.ValueMetaBase.writeData(ValueMetaBase.java:3105)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.core.row.RowMeta.writeData(RowMeta.java:717)
2023/02/11 14:47:40 - exec-info-read -  at org.apache.hop.execution.ExecutionData.getRowsBinaryGzipBase64Encoded(ExecutionData.java:190)
2023/02/11 14:47:40 - exec-info-read -  at jdk.internal.reflect.GeneratedMethodAccessor283.invoke(Unknown Source)
2023/02/11 14:47:40 - exec-info-read -  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2023/02/11 14:47:40 - exec-info-read -  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:689)
2023/02/11 14:47:40 - exec-info-read -  at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:774)
2023/02/11 14:47:40 - exec-info-read -  ... 15 more
2023/02/11 14:47:40 - exec-info-read - Execution finished on a local pipeline engine with run configuration 'local'

Issue Priority

Priority: 1

Issue Component

Component: Transforms

bamaer commented 1 year ago

can't reproduce with any execution information data that was generated after 2023-01-20, so I assume this was caused by an issue that has been fixed in the meantime. closing for now.

bamaer commented 1 year ago

closed to soon. this doesn't cause the pipeline to fail (anymore), but the errors are still available in the logs.

vdwals commented 9 months ago

Hello everyone,

I'm experiencing the same issue. In a complex workflow involving multiple pipeline executions, each pipeline concludes with a non-fatal error message appearing in the logs.

Regrettably, I encountered the same issue with the "Sort by" transformation in one instance. It seems that rows were being stored in temporary files, and upon retrieval, this error message would appear in the logs, causing the transformation to fail. As far as I understand, the issue stems from a String field being parsed as a long by Jackson, resulting in a mismatch with the expected meta type string.

While I can tolerate this error at the end of a pipeline execution, a failure during sorting is highly frustrating. I will work on creating a reproducible example with less data and provide it as soon as possible.

Thank you.

2024/02/06 11:51:25 - sku_patch - Caused by: java.lang.RuntimeException: value String : There was a data type error: the data type of java.lang.Long object [5157] does not correspond to value meta [String] 2024/02/06 11:51:25 - sku_patch - at org.apache.hop.core.row.value.ValueMetaBase.writeData(ValueMetaBase.java:3105) 2024/02/06 11:51:25 - sku_patch - at org.apache.hop.core.row.RowMeta.writeData(RowMeta.java:717) 2024/02/06 11:51:25 - sku_patch - at org.apache.hop.execution.ExecutionData.getRowsBinaryGzipBase64Encoded(ExecutionData.java:190) 2024/02/06 11:51:25 - sku_patch - at jdk.internal.reflect.GeneratedMethodAccessor232.invoke(Unknown Source) 2024/02/06 11:51:25 - sku_patch - at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2024/02/06 11:51:25 - sku_patch - at java.base/java.lang.reflect.Method.invoke(Method.java:568) 2024/02/06 11:51:25 - sku_patch - at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:688) 2024/02/06 11:51:25 - sku_patch - at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:772) 2024/02/06 11:51:25 - sku_patch - ... 15 more 2024/02/06 11:51:25 - sku_patch - Execution finished on a local pipeline engine with run configuration 'local'

vdwals commented 9 months ago

Hello again,

so now I stumbled across a similar error message when previewing the result of the attached dataflow. The issue is, in the bigger pipeline, where I need this database join, I can't see with the preview, why the number of rows is not increasing as expected by the database lookup, due to this error message maybe.

mini-example-issue-2123.zip

I'm working on a Mac M1 with Java openjdk version "17.0.3" 2022-04-19

flotho commented 2 months ago

Hi,

I'm experiencing almost the same issue with the following environment :

HOP_AUDIT_FOLDER=./audit
HOP_AUTO_CREATE_CONFIG=Y
HOP_CONFIG_FOLDER=
HOP_METADATA_FOLDER=config/projects/default/metadata,/home/florent.thomas/DEV/_ProjetsHOP//metadata
HOP_PLATFORM_OS=Linux
HOP_PLATFORM_RUNTIME=GUI
HOP_PLUGIN_BASE_FOLDERS=
HOP_REDIRECT_STDERR=
HOP_REDIRECT_STDOUT=
HOP_SHARED_JDBC_FOLDERS=
HOP_SIMPLE_STACK_TRACES=
file.encoding=UTF-8
java.class.path=lib/core/jsonrpc4j-1.6.jar:lib/core/javax.servlet-api-3.1.0.jar:lib/core/jetty-servlets-9.4.41.v20210516.jar:lib/core/jetty-client-9.4.49.v20220914.jar:lib/core/osgi.core-6.0.0.jar:lib/core/stax2-api-4.2.1.jar:lib/core/odoo-java-api-3.3.5.jar:lib/core/jackson-jaxrs-base-2.15.0.jar:lib/core/hk2-locator-2.6.1.jar:lib/core/commons-codec-1.15.jar:lib/core/commons-vfs2-2.9.0.jar:lib/core/commons-collections4-4.4.jar:lib/core/avro-1.11.3.jar:lib/core/jsch-0.1.55.jar:lib/core/jackson-core-2.15.0.jar:lib/core/jersey-jetty-connector-2.38.jar:lib/core/jersey-client-2.38.jar:lib/core/commons-math3-3.6.1.jar:lib/core/commons-validator-1.7.jar:lib/core/hop-plugins-static-schema-2.9.0.jar:lib/core/ognl-3.3.4.jar:lib/core/okio-3.9.0.jar:lib/core/webservices-api-2.3.1.jar:lib/core/xmlgraphics-commons-2.7.jar:lib/core/ws-commons-util-1.0.2.jar:lib/core/batik-util-1.17.jar:lib/core/jetty-servlet-9.4.41.v20210516.jar:lib/core/xercesImpl-2.12.2.jar:lib/core/slf4j-nop-2.0.4.jar:lib/core/asm-9.3.jar:lib/core/jetty-xml-9.4.41.v20210516.jar:lib/core/jakarta.activation-2.0.1.jar:lib/core/httpclient-4.5.13.jar:lib/core/batik-svggen-1.17.jar:lib/core/spark-avro_2.11-4.0.0.jar:lib/core/batik-constants-1.17.jar:lib/core/hk2-utils-2.6.1.jar:lib/core/commons-beanutils-1.9.4.jar:lib/core/jetty-io-9.4.41.v20210516.jar:lib/core/xmlrpc-client-3.1.3.jar:lib/core/kotlin-stdlib-1.4.10.jar:lib/core/jackson-dataformat-avro-2.15.0.jar:lib/core/rhino-1.7.14.jar:lib/core/jetty-http-9.4.41.v20210516.jar:lib/core/hop-ui-2.9.0.jar:lib/core/commons-pool-1.5.7.jar:lib/core/batik-dom-1.17.jar:lib/core/batik-svg-dom-1.17.jar:lib/core/encoder-1.2.jar:lib/core/commons-cli-1.2.jar:lib/core/guava-32.1.2-jre.jar:lib/core/hk2-api-2.6.1.jar:lib/core/sshlib-2.2.23.jar:lib/core/javassist-3.28.0-GA.jar:lib/core/hop-engine-2.9.0.jar:lib/core/batik-css-1.17.jar:lib/core/xmlrpc-common-3.1.3.jar:lib/core/jackson-annotations-2.15.0.jar:lib/core/commons-compress-1.26.0.jar:lib/core/slf4j-api-2.0.4.jar:lib/core/commons-lang3-3.12.0.jar:lib/core/jakarta.mail-2.0.1.jar:lib/core/batik-i18n-1.17.jar:lib/core/commons-configuration-1.10.jar:lib/core/batik-transcoder-1.17.jar:lib/core/jersey-container-servlet-core-2.38.jar:lib/core/failureaccess-1.0.1.jar:lib/core/commons-lang-2.6.jar:lib/core/javax.annotation-api-1.3.2.jar:lib/core/jackson-databind-2.15.0.jar:lib/core/blueprints-core-2.6.0.jar:lib/core/batik-codec-1.17.jar:lib/core/commons-io-2.14.0.jar:lib/core/flexjson-2.1.jar:lib/core/hop-core-2.9.0.jar:lib/core/jetty-webapp-9.4.41.v20210516.jar:lib/core/commons-httpclient-3.1.jar:lib/core/xml-apis-ext-1.3.04.jar:lib/core/batik-xml-1.17.jar:lib/core/hadoop-hdfs-client-3.3.6.jar:lib/core/jetty-server-9.4.41.v20210516.jar:lib/core/jackson-jaxrs-json-provider-2.15.0.jar:lib/core/jakarta.ws.rs-api-2.1.6.jar:lib/core/commons-dbcp-1.4.jar:lib/core/jersey-hk2-2.38.jar:lib/core/commons-logging-1.1.3.jar:lib/core/jandex-3.1.6.jar:lib/core/org.eclipse.core.commands-3.9.600.jar:lib/core/woodstox-core-6.4.0.jar:lib/core/jersey-common-2.38.jar:lib/core/jersey-server-2.38.jar:lib/core/batik-ext-1.17.jar:lib/core/batik-parser-1.17.jar:lib/core/httpcore-4.4.15.jar:lib/core/jersey-container-servlet-2.38.jar:lib/core/snakeyaml-2.0.jar:lib/core/hop-ui-rcp-2.9.0.jar:lib/core/snappy-java-1.1.10.5.jar:lib/core/batik-bridge-1.17.jar:lib/core/batik-gvt-1.17.jar:lib/core/jetty-util-9.4.41.v20210516.jar:lib/core/picocli-4.6.3.jar:lib/core/batik-anim-1.17.jar:lib/core/okhttp-4.12.0.jar:lib/core/kotlin-stdlib-common-1.4.10.jar:lib/core/jetty-security-9.4.41.v20210516.jar:lib/core/json-simple-1.1.1.jar:lib/core/jakarta.inject-2.6.1.jar:lib/core/org.eclipse.equinox.common-3.10.600.jar:lib/core/gson-2.10.jar:lib/core/jetty-jaas-9.4.41.v20210516.jar:lib/core/aopalliance-repackaged-2.6.1.jar:lib/core/tyrus-standalone-client-1.13.1.jar:lib/core/batik-script-1.17.jar:lib/core/commons-net-3.9.0.jar:lib/core/batik-awt-util-1.17.jar:lib/beam/beam-vendor-grpc-1_60_1-0.2.jar:lib/beam/opencensus-contrib-http-util-0.31.1.jar:lib/beam/jsr305-3.0.2.jar:lib/beam/commons-codec-1.15.jar:lib/beam/grpc-core-1.60.1.jar:lib/beam/avro-1.11.3.jar:lib/beam/protobuf-java-3.23.2.jar:lib/beam/netty-resolver-dns-4.1.89.Final.jar:lib/beam/netty-transport-4.1.89.Final.jar:lib/beam/beam-sdks-java-core-2.56.0.jar:lib/beam/minlog-1.3.1.jar:lib/beam/grpc-netty-1.60.1.jar:lib/beam/error_prone_annotations-2.10.0.jar:lib/beam/auto-value-annotations-1.8.2.jar:lib/beam/netty-all-4.1.89.Final.jar:lib/beam/kryo-shaded-4.0.2.jar:lib/beam/httpclient-4.5.13.jar:lib/beam/grpc-protobuf-1.60.1.jar:lib/beam/akka-slf4j_2.11-2.5.32.jar:lib/beam/metrics-core-4.2.12.jar:lib/beam/google-http-client-gson-1.42.3.jar:lib/beam/log4j-core-2.19.0.jar:lib/beam/akka-protobuf_2.11-2.5.32.jar:lib/beam/netty-handler-ssl-ocsp-4.1.89.Final.jar:lib/beam/netty-codec-stomp-4.1.89.Final.jar:lib/beam/zookeeper-3.8.4.jar:lib/beam/json4s-ast_2.12-3.7.0-M11.jar:lib/beam/grpc-netty-shaded-1.60.1.jar:lib/beam/netty-codec-http-4.1.89.Final.jar:lib/beam/guava-32.1.2-jre.jar:lib/beam/json4s-core_2.12-3.7.0-M11.jar:lib/beam/grpc-context-1.60.1.jar:lib/beam/netty-codec-4.1.89.Final.jar:lib/beam/netty-transport-classes-epoll-4.1.89.Final.jar:lib/beam/netty-resolver-4.1.89.Final.jar:lib/beam/netty-codec-mqtt-4.1.89.Final.jar:lib/beam/proto-google-common-protos-2.14.0.jar:lib/beam/netty-transport-classes-kqueue-4.1.89.Final.jar:lib/beam/google-api-client-2.2.0.jar:lib/beam/jackson-annotations-2.15.0.jar:lib/beam/hamcrest-2.1.jar:lib/beam/netty-resolver-dns-classes-macos-4.1.89.Final.jar:lib/beam/beam-vendor-bytebuddy-1_11_0-0.1.jar:lib/beam/commons-compress-1.26.0.jar:lib/beam/netty-transport-udt-4.1.89.Final.jar:lib/beam/grpc-protobuf-lite-1.60.1.jar:lib/beam/commons-lang3-3.9.jar:lib/beam/netty-codec-redis-4.1.89.Final.jar:lib/beam/google-auth-library-credentials-1.12.1.jar:lib/beam/slf4j-api-2.0.4.jar:lib/beam/opencensus-api-0.31.1.jar:lib/beam/hop-engine-beam-2.9.0.jar:lib/beam/scala-library-2.12.17.jar:lib/beam/grpc-auth-1.60.1.jar:lib/beam/joda-time-2.12.1.jar:lib/beam/grpc-stub-1.60.1.jar:lib/beam/classgraph-4.8.162.jar:lib/beam/zstd-jni-1.5.2-2.jar:lib/beam/grpc-alts-1.60.1.jar:lib/beam/netty-buffer-4.1.89.Final.jar:lib/beam/protobuf-java-util-3.23.2.jar:lib/beam/beam-vendor-guava-32_1_2-jre-0.1.jar:lib/beam/netty-codec-socks-4.1.89.Final.jar:lib/beam/netty-codec-haproxy-4.1.89.Final.jar:lib/beam/netty-handler-proxy-4.1.89.Final.jar:lib/beam/log4j-api-2.19.0.jar:lib/beam/json4s-jackson_2.12-3.7.0-M11.jar:lib/beam/grpc-grpclb-1.60.1.jar:lib/beam/commons-io-2.15.1.jar:lib/beam/metrics-logback-4.2.12.jar:lib/beam/byte-buddy-1.12.18.jar:lib/beam/commons-logging-1.1.3.jar:lib/beam/netty-codec-smtp-4.1.89.Final.jar:lib/beam/beam-runners-direct-java-2.56.0.jar:lib/beam/woodstox-core-6.4.0.jar:lib/beam/kryo-5.3.0.jar:lib/beam/netty-codec-http2-4.1.89.Final.jar:lib/beam/httpcore-4.4.15.jar:lib/beam/beam-model-pipeline-2.56.0.jar:lib/beam/json4s-scalap_2.12-3.7.0-M11.jar:lib/beam/grpc-api-1.60.1.jar:lib/beam/netty-common-4.1.89.Final.jar:lib/beam/netty-transport-rxtx-4.1.89.Final.jar:lib/beam/curator-client-5.4.0.jar:lib/beam/netty-transport-native-unix-common-4.1.89.Final.jar:lib/beam/antlr4-runtime-4.7.jar:lib/beam/netty-transport-sctp-4.1.89.Final.jar:lib/beam/netty-codec-dns-4.1.89.Final.jar:lib/beam/perfmark-api-0.26.0.jar:lib/beam/curator-recipes-5.4.0.jar:lib/beam/metrics-jvm-4.2.12.jar:lib/beam/compress-lzf-1.1.jar:lib/beam/google-auth-library-oauth2-http-1.12.1.jar:lib/beam/beam-model-job-management-2.56.0.jar:lib/beam/conscrypt-openjdk-uber-2.5.2.jar:lib/beam/json-simple-1.1.1.jar:lib/beam/gson-2.10.jar:lib/beam/curator-framework-5.4.0.jar:lib/beam/google-http-client-1.42.3.jar:lib/beam/netty-handler-4.1.89.Final.jar:lib/beam/netty-codec-memcache-4.1.89.Final.jar:lib/beam/paranamer-2.8.jar:lib/beam/netty-codec-xml-4.1.89.Final.jar:lib/swt/linux/x86_64/swt.jar
java.specification.version=11
java.version=11.0.11
java.vm.vendor=Red Hat, Inc.
os.arch=amd64
os.name=Linux
os.version=5.11.22-100.fc32.x86_64

I'm having a complex workflow with select values just before a sort and it failed with the following issue :

code String<binary-string> : There was a data type error: the data type of java.lang.String object [003136] does not correspond to value meta [String<binary-string>] (through reference chain: org.apache.hop.execution.ExecutionData["rowsBinaryGzipBase64Encoded"])

Whatever the cast I use in the csv extraction it failed.

hansva commented 2 months ago

I have taken a look at this a while back and I know where the issue is coming from. We are currently using references to the row data to store the sample data. When changing the datatype (using select values) this causes a mismatch between the object type and the metadata when trying to store the sample rows.

The implementation will have to be changed a bit to do a deep copy of the sample data. This will cause a bit more memory usage but the casting errors should go away

FYI. these errors are related to the execution information perspective and will not cause a pipeline to fail

flotho commented 2 months ago

I have taken a look at this a while back and I know where the issue is coming from. We are currently using references to the row data to store the sample data. When changing the datatype (using select values) this causes a mismatch between the object type and the metadata when trying to store the sample rows.

The implementation will have to be changed a bit to do a deep copy of the sample data. This will cause a bit more memory usage but the casting errors should go away

FYI. these errors are related to the execution information perspective and will not cause a pipeline to fail

waow.... a great thanks for this answer,

hansva commented 2 months ago

.take-issue