Open pmayd opened 1 year ago
I think that the automation process should be done at least in two-step.
For the first container, the files can be found in https://storage.cloud.google.com/a4d-315220-documents/docker-a4d-data-extraction/docker-a4d-data-extraction.zip, documentation can be found in the readme.md file. The problem is that although it worked for me locally, it could get deployed on gcp cloud run. It crashed due to the "devtools". I did not try it with the latest version R and our code. Possible solution could be installed dependencies without "devtools" or try it on Kubernetes cluster.
The second container could even be a cloud functions, but it needs communication and access to our gcs for the bash script. I intend to build a container to test this approach. Using bash script on our gcs, we have the flexibility to adapt and use this container only as runtime.
The docker image template and the repository for the second step (Container that runs a bash script for the data upload and runs ...) with the instruction can be found in our bucket. I zipped it and store it our bucket in case that you want to use in the future. Information and the step-by-step process can be found in the readme.md file.
In zip file can be found the necessary files and documentation for building and deployment cleaning data pipeline on GCP Cloud Run. The problem that was mentioned above is fixed and the pipeline run on Cloud Run. It runs, but since I do not have any real data input files it complains about this, otherwise it generates the log file properly, which means that there is no problem in execution and all the packages are loaded properly.
Please feel free to contact me if you have any questions or need any help.
Best regards.
Idea: