Closed vp999 closed 4 years ago
Yep, the --input
and --output
parameters are interpreted by the Beam WordCount job, they are transparent to the operator. Usually in prod environment, you want to use remote storage for both (e.g., HDFS, GCS, S3, Azure Blob Storage, etc).
I used here azure file storage. The problem is different. There are two containers one for side and other for job. Each container has drive mounted on it. I observed that for output parameter, sidecar container drive is used and for input parameter job container drive is used. Kind of mismatch in handling .
IIIC, Azure file storage is not Azure blob storage, the former is local file system while the latter is distributed file system. Usually we want to use distributed file systems in prod.
I am trying to use common flinkcluster and running different types of job using same cluster (one job at a time) Here is the setup 1) FlinkCluster yaml for cluster, volume Data is mapped to etlresearchfileshare
2) job,yaml For job, volume Data is mapped to etlresearchfileshare/demo2
parameters sent to job : input and output
Issue : I have observed that input parameter is job is correctly evalauted to etlresearchfileshare/demo2/input, but output parameters are evaluated as per mounted volume in flinkcluster i.e. evaluated as etlresearchfileshare/output. I could see output file getting created in etlresearchfileshare/output instead of etlresearchfileshare/demo2/output
seems discrepancy in handling input vs output parameters . please note that there is no etlresearchfileshare/input folder, so the input parameter is correctly evaluated as per job yaml (i,e, etlresearchfileshare/demo2/input) but output parameter is getting evaluated as per cluster yaml (i.e. etlresearchfileshare/output/)