Closed Zejnilovic closed 5 months ago
Issues
0 New issues
Measures
0 Security Hotspots
No data about Coverage
No data about Duplication
I have tested the change locally, looks good.
Details:
I have commented out kinit
, set -e
, get_dataset_info
, and handle_jceks_path
which I don't have locally and ran the original and the changed script.
The original got messed up by spaced null value:
$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value ' '
_ _ _ _____ _ _
| | | | | | / ____| (_) | |
| |__| | ___| |_ __ ___ _ __ | (___ ___ _ __ _ _ __ | |_ ___
| __ |/ _ \ | '_ \ / _ \ '__| \___ \ / __| '__| | '_ \| __/ __|
| | | | __/ | |_) | __/ | ____) | (__| | | | |_) | |_\__ \
|_| |_|\___|_| .__/ \___|_| |_____/ \___|_| |_| .__/ \__|___/
| | | |
|_| |_|
Enceladus's Helper Scripts version 1.0
Currently running run_standardization.sh
ERROR: Found unrecognized options passed to the script. Parameters are:
user_authFile.krb
while the new changed did not:
$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value ' '
_ _ _ _____ _ _
| | | | | | / ____| (_) | |
| |__| | ___| |_ __ ___ _ __ | (___ ___ _ __ _ _ __ | |_ ___
| __ |/ _ \ | '_ \ / _ \ '__| \___ \ / __| '__| | '_ \| __/ __|
| | | | __/ | |_) | __/ | ____) | (__| | | | |_) | |_\__ \
|_| |_|\___|_| .__/ \___|_| |_____/ \___|_| |_| .__/ \__|___/
| | | |
|_| |_|
Enceladus's Helper Scripts version 1.0
Currently running run_standardization.sh
Dynamic Resource Allocation enabled
Using Keytab from HDFS
Command line:
/opt/spark-2.4.4/bin/spark-submit --master yarn --deploy-mode cluster --files /absolute/path/application.conf#application.conf --conf spark.logConf=true --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.maxExecutors=4 --conf spark.dynamicAllocation.minExecutors=0 --conf spark.dynamicAllocation.executorAllocationRatio=0.5 --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=134217728 --conf spark.yarn.submit.waitAppCompletion=false --conf "spark.driver.extraJavaOptions=-Dstandardized.hdfs.path=/bigdata/std/std-{0}-{1}-{2}-{3} -Dspline.mongodb.url=mongodb://localhost:27017 -Dspline.mongodb.name=spline -Dhdp.version=2.7.3 -Dconfig.file=application.conf " --conf "spark.executor.extraJavaOptions= " --class za.co.absa.enceladus.standardization.StandardizationJob enceladus-spark-jobs.jar --menas-auth-keytab user_authFile.krb --dataset-name Custom_datasetName --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --delimiter \U+00A6 --header false --null-value ' '
Closes #2210