AbsaOSS / enceladus

Dynamic Conformance Engine
Apache License 2.0
29 stars 14 forks source link

Update s3a_wrapper.sh to support whitespaces #2211

Closed Zejnilovic closed 5 months ago

Zejnilovic commented 5 months ago

Closes #2210

sonarcloud[bot] commented 5 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

dk1844 commented 5 months ago

I have tested the change locally, looks good. Details: I have commented out kinit, set -e, get_dataset_info, and handle_jceks_path which I don't have locally and ran the original and the changed script.

The original got messed up by spaced null value:

$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value '        '
  _    _      _                    _____           _       _
 | |  | |    | |                  / ____|         (_)     | |
 | |__| | ___| |_ __   ___ _ __  | (___   ___ _ __ _ _ __ | |_ ___
 |  __  |/ _ \ | '_ \ / _ \ '__|  \___ \ / __| '__| | '_ \| __/ __|
 | |  | |  __/ | |_) |  __/ |     ____) | (__| |  | | |_) | |_\__ \
 |_|  |_|\___|_| .__/ \___|_|    |_____/ \___|_|  |_| .__/ \__|___/
               | |                                  | |
               |_|                                  |_|

 Enceladus's Helper Scripts version 1.0
 Currently running run_standardization.sh

ERROR: Found unrecognized options passed to the script. Parameters are:
    user_authFile.krb

while the new changed did not:

$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value '        '
  _    _      _                    _____           _       _
 | |  | |    | |                  / ____|         (_)     | |
 | |__| | ___| |_ __   ___ _ __  | (___   ___ _ __ _ _ __ | |_ ___
 |  __  |/ _ \ | '_ \ / _ \ '__|  \___ \ / __| '__| | '_ \| __/ __|
 | |  | |  __/ | |_) |  __/ |     ____) | (__| |  | | |_) | |_\__ \
 |_|  |_|\___|_| .__/ \___|_|    |_____/ \___|_|  |_| .__/ \__|___/
               | |                                  | |
               |_|                                  |_|

 Enceladus's Helper Scripts version 1.0
 Currently running run_standardization.sh

Dynamic Resource Allocation enabled
Using Keytab from HDFS
Command line:
/opt/spark-2.4.4/bin/spark-submit --master yarn --deploy-mode cluster --files /absolute/path/application.conf#application.conf  --conf spark.logConf=true --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.maxExecutors=4 --conf spark.dynamicAllocation.minExecutors=0 --conf spark.dynamicAllocation.executorAllocationRatio=0.5 --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=134217728 --conf spark.yarn.submit.waitAppCompletion=false --conf "spark.driver.extraJavaOptions=-Dstandardized.hdfs.path=/bigdata/std/std-{0}-{1}-{2}-{3} -Dspline.mongodb.url=mongodb://localhost:27017 -Dspline.mongodb.name=spline -Dhdp.version=2.7.3    -Dconfig.file=application.conf    " --conf "spark.executor.extraJavaOptions=  " --class za.co.absa.enceladus.standardization.StandardizationJob enceladus-spark-jobs.jar --menas-auth-keytab user_authFile.krb --dataset-name Custom_datasetName --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --delimiter \U+00A6 --header false --null-value '        '