Closed daw3rd closed 5 months ago
Tools/ingest2parquet
Testing of ingest2parquet show lots of ERROR messages, but does not fail the test.
Lots of error messages without faling the run of ingest2parquet_local.py? Perhaps these can be changed to WARNINGS?
cd tools/ingest2parquet make venv make test-src
gets
... Executing: python src/ingest2parquet_local.py 13:02:37 INFO - data factory data_ is using local data accessinput_folder - /home/dawood/git/fm-data-engineering/tools/ingest2parquet/test-data/input output_folder - /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output 13:02:37 INFO - data factory data_ max_files -1, n_sample -1 13:02:37 INFO - data factory data_ Not using data sets, checkpointing False, max files -1, random samples -1, files to use ['.zip'] Number of files is 2 filepath /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/utils/lang_extensions.json 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x80 in position 11: invalid start byte skipping environments-master/cfortunes/diebenkorn_notes.dat Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc3 in position 7: invalid continuation byte skipping environments-master/cfortunes/obliquestrategies.dat Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfc in position 10: invalid start byte skipping application-java/lib/application-java.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe5 in position 14: invalid continuation byte skipping application-java/lib/fabric-gateway-java-2.1.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf9 in position 10: invalid start byte skipping application-java/lib/fabric-sdk-java-2.1.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-protobuf-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe1 in position 10: invalid continuation byte skipping application-java/lib/protobuf-java-util-3.10.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 11: invalid start byte skipping application-java/lib/api-common-1.9.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xba in position 25: invalid start byte skipping environments-master/commands/grel Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode bytes in position 40-41: invalid continuation byte skipping environments-master/commands/ldid Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb7 in position 10: invalid start byte skipping application-java/lib/milagro-crypto-java-0.4.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-stub-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-netty-1.23.0.jar Error: No contents decoded output_file_name /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/https___github.com_00000o1_environments_archive_refs_heads_master.parquet 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-core-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-protobuf-lite-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-api-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfe in position 50: invalid start byte skipping application-java/lib/guava-29.0-jre.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfe in position 50: invalid start byte skipping application-java/lib/failureaccess-1.0.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf3 in position 50: invalid continuation byte skipping application-java/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe7 in position 12: invalid continuation byte skipping application-java/lib/perfmark-api-0.17.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf0 in position 10: invalid continuation byte skipping application-java/lib/jsr305-3.0.2.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xac in position 10: invalid start byte skipping application-java/lib/checker-qual-2.11.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x82 in position 12: invalid start byte skipping application-java/lib/error_prone_annotations-2.3.4.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x99 in position 53: invalid start byte skipping application-java/lib/j2objc-annotations-1.3.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf6 in position 10: invalid start byte skipping application-java/lib/cloudant-client-2.19.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9d in position 89: invalid start byte skipping application-java/lib/netty-tcnative-boringssl-static-2.0.30.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa8 in position 10: invalid start byte skipping application-java/lib/netty-codec-http2-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa7 in position 10: invalid start byte skipping application-java/lib/protobuf-java-3.10.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9c in position 11: invalid start byte skipping application-java/lib/bcpkix-jdk15on-1.62.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc5 in position 10: invalid continuation byte skipping application-java/lib/httpclient-4.5.12.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/commons-logging-1.2.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 14: invalid start byte skipping application-java/lib/commons-cli-1.4.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xca in position 14: invalid continuation byte skipping application-java/lib/commons-compress-1.20.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xf5 in position 10: invalid start byte skipping application-java/lib/cloudant-http-2.19.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xcf in position 15: invalid continuation byte skipping application-java/lib/commons-io-2.6.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc5 in position 10: invalid continuation byte skipping application-java/lib/apache-log4j-extras-1.2.17.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa6 in position 12: invalid start byte skipping application-java/lib/log4j-1.2.17.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfd in position 10: invalid start byte skipping application-java/lib/futures-extra-4.2.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb3 in position 10: invalid start byte skipping application-java/lib/javax.json-1.1.4.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xe7 in position 10: invalid continuation byte skipping application-java/lib/snakeyaml-1.26.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x97 in position 10: invalid start byte skipping application-java/lib/jaxb-api-2.3.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xc9 in position 10: invalid continuation byte skipping application-java/lib/javax.annotation-api-1.3.2.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/gson-2.8.5.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xaa in position 10: invalid start byte skipping application-java/lib/commons-codec-1.11.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xfb in position 10: invalid start byte skipping application-java/lib/netty-handler-proxy-4.1.38.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x88 in position 11: invalid start byte skipping application-java/lib/proto-google-common-protos-1.12.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x88 in position 10: invalid start byte skipping application-java/lib/netty-codec-http-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte skipping application-java/lib/netty-handler-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xb2 in position 10: invalid start byte skipping application-java/lib/netty-codec-socks-4.1.38.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte skipping application-java/lib/netty-codec-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte skipping application-java/lib/netty-transport-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte skipping application-java/lib/netty-buffer-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x96 in position 12: invalid start byte skipping application-java/lib/netty-resolver-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xcc in position 10: invalid continuation byte skipping application-java/lib/netty-common-4.1.49.Final.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9b in position 11: invalid start byte skipping application-java/lib/bcprov-jdk15on-1.62.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x94 in position 16: invalid start byte skipping application-java/lib/httpcore-4.4.13.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xac in position 10: invalid start byte skipping application-java/lib/auto-value-annotations-1.7.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa5 in position 89: invalid start byte skipping application-java/lib/commons-math3-3.6.1.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa8 in position 10: invalid start byte skipping application-java/lib/javax.activation-api-1.2.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x9b in position 11: invalid start byte skipping application-java/lib/annotations-4.1.1.4.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x86 in position 10: invalid start byte skipping application-java/lib/opencensus-contrib-grpc-metrics-0.21.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x86 in position 11: invalid start byte skipping application-java/lib/opencensus-api-0.21.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0xa1 in position 11: invalid start byte skipping application-java/lib/grpc-context-1.23.0.jar Error: No contents decoded 13:02:37 ERROR - Error -> 'utf-8' codec can't decode byte 0x8e in position 10: invalid start byte skipping application-java/lib/animal-sniffer-annotations-1.17.jar Error: No contents decoded output_file_name /home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/application-java.parquet processing stats generated {'total_files_given': 2, 'total_files_processed': 2, 'total_files_failed_to_processed': 0, 'total_no_of_rows': 54, 'total_bytes_in_memory': 79661, 'failure_details': []} Metadata file stored - response: {'name': '/home/dawood/git/fm-data-engineering/tools/ingest2parquet/src/../test-data/output/metadata.json', 'size': 445} [dawood@data-engineering1 ingest2parquet]$
No response
MacOS (limited support)
3.10.x
verified
Search before asking
Component
Tools/ingest2parquet
What happened + What you expected to happen
Testing of ingest2parquet show lots of ERROR messages, but does not fail the test.
Reproduction script
Lots of error messages without faling the run of ingest2parquet_local.py? Perhaps these can be changed to WARNINGS?
gets
Anything else
No response
OS
MacOS (limited support)
Python
3.10.x
Are you willing to submit a PR?