Open github-actions[bot] opened 8 months ago
It first failed on https://github.com/apache/beam/actions/runs/8210266873.
The failed task is :sdks:python:test-suites:portable:py38:portableWordCountSparkRunnerBatch
.
Traceback:
INFO:apache_beam.utils.subprocess_server:Starting service with ('java' '-jar' '/runner/_work/beam/beam/runners/spark/3/job-server/build/libs/beam-runners-spark-3-job-server-2.56.0-SNAPSHOT.jar' '--spark-master-url' 'local[4]' '--artifacts-dir' '/tmp/beam-temp8q8022zi/artifactsg6e8usou' '--job-port' '56313' '--artifact-port' '0' '--expansion-port' '0')
INFO:apache_beam.utils.subprocess_server:Error: A JNI error has occurred, please check your installation and try again
INFO:apache_beam.utils.subprocess_server:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/beam/vendor/grpc/v1p60p1/io/grpc/BindableService
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.defineClass1(Native Method)
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
INFO:apache_beam.utils.subprocess_server: at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
INFO:apache_beam.utils.subprocess_server: at java.security.AccessController.doPrivileged(Native Method)
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
INFO:apache_beam.utils.subprocess_server: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
INFO:apache_beam.utils.subprocess_server: at java.lang.Class.getDeclaredMethods0(Native Method)
INFO:apache_beam.utils.subprocess_server: at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
INFO:apache_beam.utils.subprocess_server: at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
INFO:apache_beam.utils.subprocess_server: at java.lang.Class.getMethod0(Class.java:3018)
INFO:apache_beam.utils.subprocess_server: at java.lang.Class.getMethod(Class.java:1784)
INFO:apache_beam.utils.subprocess_server: at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:670)
INFO:apache_beam.utils.subprocess_server: at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:652)
INFO:apache_beam.utils.subprocess_server:Caused by: java.lang.ClassNotFoundException: org.apache.beam.vendor.grpc.v1p60p1.io.grpc.BindableService
INFO:apache_beam.utils.subprocess_server: at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
INFO:apache_beam.utils.subprocess_server: at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
INFO:apache_beam.utils.subprocess_server: at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
INFO:apache_beam.utils.subprocess_server: ... 19 more
ERROR:apache_beam.utils.subprocess_server:Started job service with ('java', '-jar', '/runner/_work/beam/beam/runners/spark/3/job-server/build/libs/beam-runners-spark-3-job-server-2.56.0-SNAPSHOT.jar', '--spark-master-url', 'local[4]', '--artifacts-dir', '/tmp/beam-temp8q8022zi/artifactsg6e8usou', '--job-port', '56313', '--artifact-port', '0', '--expansion-port', '0')
ERROR:apache_beam.utils.subprocess_server:Error bringing up service
Traceback (most recent call last):
File "/runner/_work/beam/beam/sdks/python/apache_beam/utils/subprocess_server.py", line 175, in start
raise RuntimeError(
RuntimeError: Service failed to start up with error 1
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/runner/_work/beam/beam/sdks/python/apache_beam/examples/wordcount.py", line 111, in <module>
run()
File "/runner/_work/beam/beam/sdks/python/apache_beam/examples/wordcount.py", line 106, in run
output | 'Write' >> WriteToText(known_args.output)
File "/runner/_work/beam/beam/sdks/python/apache_beam/pipeline.py", line 612, in __exit__
self.result = self.run()
File "/runner/_work/beam/beam/sdks/python/apache_beam/pipeline.py", line 586, in run
return self.runner.run_pipeline(self, self._options)
File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/runner.py", line 192, in run_pipeline
return self.run_portable_pipeline(
File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/portable_runner.py", line 381, in run_portable_pipeline
job_service_handle = self.create_job_service(options)
File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/portable_runner.py", line 296, in create_job_service
return self.create_job_service_handle(server.start(), options)
File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/job_server.py", line 81, in start
self._endpoint = self._job_server.start()
File "/runner/_work/beam/beam/sdks/python/apache_beam/runners/portability/job_server.py", line 110, in start
return self._server.start()
File "/runner/_work/beam/beam/sdks/python/apache_beam/utils/subprocess_server.py", line 175, in start
raise RuntimeError(
RuntimeError: Service failed to start up with error 1
> Task :sdks:python:test-suites:portable:py38:portableWordCountSparkRunnerBatch FAILED
Added the owner of the commit whose post-commit job failed at the first time. @damccorm
I think we can pretty comfortably rule out that change, it was to the yaml sdk which is unrelated to portableWordCountSparkRunnerBatch. Note that this runs on a schedule, not on commits, though none of the commits in that scheduled time look particularly harmful
I see. It was red for the last two weeks and flaky before that too.
Permared right now
Only sorta - each component job is actually not permared - e.g. there are 2 successes here, https://github.com/apache/beam/actions/runs/8873798546
The whole workflow is permared just because our flake percentage is so high
Yea, let's work out how to get top-level signal.
The lowest and highest Python version (3.8, 3.11) are running more tests than (3.9, 3.10), could be those tests or task permared
Could make sense to find a way to get separate top-level signal for Python versions, assuming we can use software engineering to share everything necessary so they don't get out of sync.
Yeah, we used to have this for Jenkins where each Python PostCommit had its own task
The Vertex AI package version issue (we do not import this directly. So it should be fine.):
../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
-- | --
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| ../../build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33 |
| /runner/_work/beam/beam/build/gradleenv/-1734967050/lib/python3.9/site-packages/vertexai/preview/developer/__init__.py:33: DeprecationWarning: |
| After May 30, 2024, importing any code below will result in an error. |
| Please verify that you are explicitly pinning to a version of `google-cloud-aiplatform` |
| (e.g., google-cloud-aiplatform==[1.32.0, 1.49.0]) if you need to continue using this |
| library. |
| |
| from vertexai.preview import ( |
| init, |
| remote, |
| VertexModel, |
| register, |
| from_pretrained, |
| developer, |
| hyperparameter_tuning, |
| tabular_models, |
| ) |
|
A new flaky test in py39 and this is related to https://github.com/apache/beam/issues/29617:
https://ge.apache.org/s/hb7syztoolfhu/console-log?page=17
=================================== FAILURES =================================== |
-- | --
| [31m[1m_______________ BigQueryQueryToTableIT.test_big_query_legacy_sql _______________[0m |
| [gw3] linux -- Python 3.9.19 /runner/_work/beam/beam/build/gradleenv/1398941893/bin/python3.9 |
| |
| self = <apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT testMethod=test_big_query_legacy_sql> |
| |
| [37m@pytest[39;49;00m.mark.it_postcommit[90m[39;49;00m |
| [94mdef[39;49;00m [92mtest_big_query_legacy_sql[39;49;00m([96mself[39;49;00m):[90m[39;49;00m |
| verify_query = DIALECT_OUTPUT_VERIFY_QUERY % [96mself[39;49;00m.output_table[90m[39;49;00m |
| expected_checksum = test_utils.compute_hash(DIALECT_OUTPUT_EXPECTED)[90m[39;49;00m |
| pipeline_verifiers = [[90m[39;49;00m |
| PipelineStateMatcher(),[90m[39;49;00m |
| BigqueryMatcher([90m[39;49;00m |
| project=[96mself[39;49;00m.project,[90m[39;49;00m |
| query=verify_query,[90m[39;49;00m |
| checksum=expected_checksum)[90m[39;49;00m |
| ][90m[39;49;00m |
| [90m[39;49;00m |
| extra_opts = {[90m[39;49;00m |
| [33m'[39;49;00m[33mquery[39;49;00m[33m'[39;49;00m: LEGACY_QUERY,[90m[39;49;00m |
| [33m'[39;49;00m[33moutput[39;49;00m[33m'[39;49;00m: [96mself[39;49;00m.output_table,[90m[39;49;00m |
| [33m'[39;49;00m[33moutput_schema[39;49;00m[33m'[39;49;00m: DIALECT_OUTPUT_SCHEMA,[90m[39;49;00m |
| [33m'[39;49;00m[33muse_standard_sql[39;49;00m[33m'[39;49;00m: [94mFalse[39;49;00m,[90m[39;49;00m |
| [33m'[39;49;00m[33mwait_until_finish_duration[39;49;00m[33m'[39;49;00m: WAIT_UNTIL_FINISH_DURATION_MS,[90m[39;49;00m |
| [33m'[39;49;00m[33mon_success_matcher[39;49;00m[33m'[39;49;00m: all_of(*pipeline_verifiers),[90m[39;49;00m |
| }[90m[39;49;00m |
| options = [96mself[39;49;00m.test_pipeline.get_full_options_as_args(**extra_opts)[90m[39;49;00m |
| > big_query_query_to_table_pipeline.run_bq_pipeline(options)[90m[39;49;00m |
| |
| [1m[31mapache_beam/io/gcp/big_query_query_to_table_it_test.py[0m:178: |
| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |
| [1m[31mapache_beam/io/gcp/big_query_query_to_table_pipeline.py[0m:103: in run_bq_pipeline |
| result = p.run()[90m[39;49;00m |
| [1m[31mapache_beam/testing/test_pipeline.py[0m:115: in run |
| result = [96msuper[39;49;00m().run([90m[39;49;00m |
| [1m[31mapache_beam/pipeline.py[0m:560: in run |
| [94mreturn[39;49;00m Pipeline.from_runner_api([90m[39;49;00m |
| [1m[31mapache_beam/pipeline.py[0m:587: in run |
| [94mreturn[39;49;00m [96mself[39;49;00m.runner.run_pipeline([96mself[39;49;00m, [96mself[39;49;00m._options)[90m[39;49;00m |
| [1m[31mapache_beam/runners/direct/test_direct_runner.py[0m:42: in run_pipeline |
| [96mself[39;49;00m.result = [96msuper[39;49;00m().run_pipeline(pipeline, options)[90m[39;49;00m |
| [1m[31mapache_beam/runners/direct/direct_runner.py[0m:117: in run_pipeline |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mportability[39;49;00m[04m[96m.[39;49;00m[04m[96mfn_api_runner[39;49;00m [94mimport[39;49;00m fn_runner[90m[39;49;00m |
| [1m[31mapache_beam/runners/portability/fn_api_runner/__init__.py[0m:18: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mportability[39;49;00m[04m[96m.[39;49;00m[04m[96mfn_api_runner[39;49;00m[04m[96m.[39;49;00m[04m[96mfn_runner[39;49;00m [94mimport[39;49;00m FnApiRunner[90m[39;49;00m |
| [1m[31mapache_beam/runners/portability/fn_api_runner/fn_runner.py[0m:68: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mportability[39;49;00m[04m[96m.[39;49;00m[04m[96mfn_api_runner[39;49;00m [94mimport[39;49;00m execution[90m[39;49;00m |
| [1m[31mapache_beam/runners/portability/fn_api_runner/execution.py[0m:62: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mportability[39;49;00m[04m[96m.[39;49;00m[04m[96mfn_api_runner[39;49;00m [94mimport[39;49;00m translations[90m[39;49;00m |
| [1m[31mapache_beam/runners/portability/fn_api_runner/translations.py[0m:55: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mworker[39;49;00m [94mimport[39;49;00m bundle_processor[90m[39;49;00m |
| [1m[31mapache_beam/runners/worker/bundle_processor.py[0m:69: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mworker[39;49;00m [94mimport[39;49;00m operations[90m[39;49;00m |
| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |
| |
| > [04m[91m?[39;49;00m[04m[91m?[39;49;00m[04m[91m?[39;49;00m[90m[39;49;00m |
| [1m[31mE KeyError: '__pyx_vtable__'[0m |
| |
| [1m[31mapache_beam/runners/worker/operations.py[0m:1: KeyError
Last three runs are green now.
Close this for now.
Great. Thanks @liferoad
Reopening since the workflow is still flaky
New error:
==================================== ERRORS ==================================== |
-- | --
| [31m[1m________________ ERROR at setup of ReadTests.test_native_source ________________[0m |
| [gw5] linux -- Python 3.9.19 /runner/_work/beam/beam/build/gradleenv/1398941893/bin/python3.9 |
| |
| self = <apache_beam.io.gcp.bigquery_tools.BigQueryWrapper object at 0x7f248f59baf0> |
| project_id = 'apache-beam-testing' |
| dataset_id = 'python_read_table_17178042710ffd3b', location = None |
| labels = None |
| |
| [37m@retry[39;49;00m.with_exponential_backoff([90m[39;49;00m |
| num_retries=MAX_RETRIES,[90m[39;49;00m |
| retry_filter=retry.retry_on_server_errors_and_timeout_filter)[90m[39;49;00m |
| [94mdef[39;49;00m [92mget_or_create_dataset[39;49;00m([90m[39;49;00m |
| [96mself[39;49;00m, project_id, dataset_id, location=[94mNone[39;49;00m, labels=[94mNone[39;49;00m):[90m[39;49;00m |
| [90m# Check if dataset already exists otherwise create it[39;49;00m[90m[39;49;00m |
| [94mtry[39;49;00m:[90m[39;49;00m |
| > dataset = [96mself[39;49;00m.client.datasets.Get([90m[39;49;00m |
| bigquery.BigqueryDatasetsGetRequest([90m[39;49;00m |
| projectId=project_id, datasetId=dataset_id))[90m[39;49;00m |
| |
| [1m[31mapache_beam/io/gcp/bigquery_tools.py[0m:809:
I looked at a couple flakes and could not discern if they represented anything that should be release blocking, so I am moving this to the next release milestone.
Green for last two days.
Reopening since the workflow is still flaky
[31m[1m_______ ERROR collecting apache_beam/runners/worker/log_handler_test.py ________[0m |
-- | --
| [1m[31mapache_beam/runners/worker/log_handler_test.py[0m:34: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mworker[39;49;00m [94mimport[39;49;00m bundle_processor[90m[39;49;00m |
| [1m[31mapache_beam/runners/worker/bundle_processor.py[0m:69: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mworker[39;49;00m [94mimport[39;49;00m operations[90m[39;49;00m |
| [1m[31mapache_beam/runners/worker/operations.py[0m:1: in init apache_beam.runners.worker.operations |
| [04m[91m?[39;49;00m[04m[91m?[39;49;00m[04m[91m?[39;49;00m[90m[39;49;00m |
| [1m[31mE KeyError: '__pyx_vtable__'[0m |
| [31m[1m________ ERROR collecting apache_beam/runners/worker/opcounters_test.py ________[0m |
| [1m[31mapache_beam/runners/worker/opcounters_test.py[0m:27: in <module> |
| [94mfrom[39;49;00m [04m[96mapache_beam[39;49;00m[04m[96m.[39;49;00m[04m[96mrunners[39;49;00m[04m[96m.[39;49;00m[04m[96mworker[39;49;00m [94mimport[39;49;00m opcounters[90m[39;49;00m |
| [1m[31mapache_beam/runners/worker/opcounters.py[0m:1: in init apache_beam.runners.worker.opcounters |
| [04m[91m?[39;49;00m[04m[91m?[39;49;00m[04m[91m?[39;49;00m[90m[39;49;00m |
| [1m[31mE ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject[0m
[36m[1m=========================== short test summary info ============================[0m |
-- | --
| [31mERROR[0m apache_beam/dataframe/transforms_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/dataframe/transforms_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/render_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/render_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/trivial_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/trivial_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/dataflow/dataflow_job_service_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/dataflow/dataflow_job_service_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/interactive/interactive_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/interactive/interactive_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/interactive/utils_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/interactive/utils_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/flink_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/flink_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/flink_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/flink_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/local_job_service_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/local_job_service_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/portable_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/portable_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/samza_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/samza_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_java_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_java_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/spark_uber_jar_job_server_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/fn_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/fn_runner_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/translations_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/translations_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/bundle_processor_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/log_handler_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/opcounters_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |
| [31mERROR[0m apache_beam/runners/portability/fn_api_runner/trigger_manager_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/bundle_processor_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/log_handler_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/opcounters_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |
| [31mERROR[0m apache_beam/runners/worker/sdk_worker_main_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/sdk_worker_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/sideinputs_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |
| [31mERROR[0m apache_beam/runners/worker/sdk_worker_main_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/sdk_worker_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/runners/worker/sideinputs_test.py - ValueError: apache_beam.utils.counters.Counter size changed, may indicate binary incompatibility. Expected 56 from C header, got 32 from PyObject |
| [31mERROR[0m apache_beam/testing/load_tests/microbenchmarks_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/transforms/combinefn_lifecycle_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/testing/load_tests/microbenchmarks_test.py - KeyError: '__pyx_vtable__' |
| [31mERROR[0m apache_beam/transforms/combinefn_lifecycle_test.py - KeyError: '__pyx_vtable__'
No cython issues in recent runs, just a number of flakes for tests with external connections (GCSIO, RRIO) that aren't consistent across Python versions or different runs
Currently Python3.12 Dataflow test has two test failing consistently:
apache_beam/ml/inference/sklearn_inference_it_test.py::SklearnInference::test_sklearn_mnist_classification
apache_beam/ml/inference/sklearn_inference_it_test.py::SklearnInference::test_sklearn_mnist_classification_large_model
Error:
subprocess.CalledProcessError: Command '['/runner/_work/beam/beam/build/gradleenv/2050596100/bin/python3.12', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/tmp/tmpoq1ebvgy/tmp_requirements.txt', '--exists-action', 'i', '--no-deps', '--implementation', 'cp', '--abi', 'cp312', '--platform', 'manylinux2014_x86_64']' returned non-zero exit status 1.
Error compiling Cython file:
sklearn/utils/_vector_sentinel.pyx:31:9: Previous declaration is here
Cannot install sklearn from source using cython
happened as early as https://github.com/apache/beam/commits/5b2bfe96f83a5631c3a8d5c3b92a0f695ffe2d7d
We need bump sklearn requirements here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/inference/sklearn_examples_requirements.txt
Reopening since the workflow is still flaky
Reopening since the workflow is still flaky
2024-08-30T07:28:39.6571287Z if setup_options.setup_file is not None: 2024-08-30T07:28:39.6571763Z if not os.path.isfile(setup_options.setup_file): 2024-08-30T07:28:39.6572227Z > raise RuntimeError( 2024-08-30T07:28:39.6572923Z 'The file %s cannot be found. It was specified in the ' 2024-08-30T07:28:39.6573578Z '--setup_file command line option.' % setup_options.setup_file) 2024-08-30T07:28:39.6574970Z [1m[31mE RuntimeError: The file /runner/_work/beam/beam/sdks/python/apache_beam/examples/complete/juliaset/src/setup.py cannot be found. It was specified in the --setup_file command line option.[0m
Currently failing test:
gradlew :sdks:python:test-suites:portable:py312:portableLocalRunnerJuliaSetWithSetupPy
This is red again - https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python.yml?query=branch%3Amaster
It looks like there are currently 2 issues:
@jrmccluskey would you mind taking a look at these?
Failure in the 3.9 postcommit is apache_beam/examples/fastavro_it_test.py::FastavroIT::test_avro_it, will dive deeper into that shortly
The problem in the TensorRT container is that we seem to have two different versions of CUDA installed, one at version 11.8 and the other at 12.1 (we want everything at 12.1)
Looks like after sickbaying TensorRT tests, there are still failures. https://ge.apache.org/s/27igat7sfmcsu/console-log/task/:sdks:python:test-suites:portable:py310:portableWordCountSparkRunnerBatch?anchor=60&page=1 is an example, it looks like we're failing because we're missing a class in the spark runner.
@Abacn would you mind taking a look? Its unclear why this is happening now, but I'm guessing it may be related to https://github.com/apache/beam/pull/32976 (and maybe some caching kept it from showing up?)
Looks like after sickbaying TensorRT tests, there are still failures. https://ge.apache.org/s/27igat7sfmcsu/console-log/task/:sdks:python:test-suites:portable:py310:portableWordCountSparkRunnerBatch?anchor=60&page=1 is an example, it looks like we're failing because we're missing a class in the spark runner.
@Abacn would you mind taking a look? Its unclear why this is happening now, but I'm guessing it may be related to #32976 (and maybe some caching kept it from showing up?)
It's bad gradle cache. Cannot reproduce locally on master branch. Also inspected the expansion jar.
For some reason, recently, Gradle cache for shadowJar breaks more frequently
It started to fail last week again (Friday days ago) since the distroless python sdk PR: https://github.com/apache/beam/commit/81f35ab62298a2ec9fadeded82461b363b6401db (@damondouglas)
#21 [distroless 5/6] COPY --from=base /usr/lib/python3.9 /usr/lib/python3.9 |
-- | --
| #21 ERROR: failed to calculate checksum of ref 21e0551f-9179-41a9-b6c7-d487e40b7288::4b5lek0fokkw0omzyb94t5h7y: "/usr/lib/python3.9": not found
There is no /usr/lib/python3.9
under in the image python:3.9-bookworm. I can only see python3 and python3.11 folders there, and I think we may need to copy the python3 one.
$ docker run -it python:3.9-bookworm bash
root@b730cccba5a8:/# ls -d /usr/lib/python*
/usr/lib/python3 /usr/lib/python3.11
root@b730cccba5a8:/# ls -d /usr/local/lib/python*
/usr/local/lib/python3.11 /usr/local/lib/python3.9
@damondouglas , could you confirm that?
@shunping I think Damon is on vacation, if there is a quick fix please go ahead and apply it, otherwise could you please revert and we can try again when Damon is back/after the 2.61.0 release
cc/ @Abacn
sg, will see if the fix in my mind will can work.
Ok, take another look at this. The test started to fail at 11/06 6:32PM (https://github.com/apache/beam/actions/runs/11713650994), the last successful run was at 11/06 12:33PM (https://github.com/apache/beam/actions/runs/11708854671). There are two commits during this time internal:
The Kafka error message is shown below:
FAILED apache_beam/io/external/xlang_kafkaio_it_test.py::CrossLanguageKafkaIOTest::test_local_kafkaio_populated_key - RuntimeError: Pipeline BeamApp-runner-1111115329-514dd26a_03822608-80d0-4037-bc13-11d632204f46 failed in state FAILED: java.lang.RuntimeException: Error received from SDK harness for instruction 3: org.apache.beam.sdk.util.UserCodeException: java.io.IOException: KafkaWriter : failed to send 1 records (since last report)
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
at org.apache.beam.sdk.io.kafka.KafkaWriter$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForParDo(FnApiDoFnRunner.java:810)
at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:348)
at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:275)
at org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1837)
at org.apache.beam.fn.harness.FnApiDoFnRunner.access$3100(FnApiDoFnRunner.java:145)
at org.apache.beam.fn.harness.FnApiDoFnRunner$NonWindowObservingProcessBundleContext.output(FnApiDoFnRunner.java:2695)
at org.apache.beam.sdk.transforms.MapElements$2.processElement(MapElements.java:151)
at org.apache.beam.sdk.transforms.MapElements$2$DoFnInvoker.invokeProcessElement(Unknown Source)
at org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForParDo(FnApiDoFnRunner.java:810)
at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:348)
at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:275)
at org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:213)
at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.multiplexElements(BeamFnDataInboundObserver.java:172)
at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.awaitCompletion(BeamFnDataInboundObserver.java:136)
at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:550)
at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:150)
at org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:115)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: KafkaWriter : failed to send 1 records (since last report)
at org.apache.beam.sdk.io.kafka.KafkaWriter.checkForFailures(KafkaWriter.java:183)
at org.apache.beam.sdk.io.kafka.KafkaWriter.processElement(KafkaWriter.java:66)
Caused by: org.apache.kafka.common.errors.TimeoutException: Topic xlang_kafkaio_test_populated_key_e9df3a07-037f-45a1-afde-7cea599f9570 not present in metadata after 60000 ms.
@Abacn , could you check this and see if we need to roll it back?
Thanks for taking care of it. I am +1 for rollback. The first distroless PR was expected to be a no-op for 2.61.0 release. Good to know it broke something before release cut.
The PostCommit Python is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python.yml?query=is%3Afailure+branch%3Amaster to see the logs.