Closed afoerster closed 6 years ago
This is what streamsets uses internally for this same use case - https://github.com/clusterdock/topology_cdh
@brockn which is better direct docker or clusterdock topology_cdh??
I'd look at using topology_cdh. As you know it's a ton of work to setup a CDH cluster inside docker yourself. While I think that is a useful experience and cool to have seen done, I'd look at using topology_cdh in the future. It's heavily used within StreamSets and also Cloudera.
An advantage of using ClusterDock is that you don't have to maintain an image, which with as many dependencies as are needed isn't a small thing.
Complete. Opening new issue for one pipeline that still needs a test, kudu-hdfs-parquet-sqoop
Currently pipelines need to be tested on a cluster. We should have the ability to test Kudu/HDFS/Impala inside of a Docker container. This will speed development of pipelines and make it easier to verify bug fixes.