Open BsoBird opened 6 months ago
Hi @BsoBird,
Thanks for the suggestion, we are going to look deeper on this technology.
Best regards,
Clemlab Team
It seems that MR3 is not open source, so maybe not a good idea to use it
@farshad-allahdadi Yes, it is a semi-open source product, and its core package, mr3-core, is not open source. But I don't think this is important for three reasons:
Ambari is a great platform for engaging users and promoting the latest technologies and ideas related to Big Data, and I hope the Clemlab team can give these small teams more opportunities to serve customers, and they can reflect the Clemlab team's technological vision and professionalism in keeping up with the times. Tks.
@BsoBird That was my personal view and concern, which is when I decide to use a piece of software, I need to be sure that either I have access to the source code to fix the possible problems or have access to a minimum level of support from the owner of the software. In case of MR3 how do you handled that without buying the license?
@farshad-allahdadi MR3 is currently divided into two parts of code, the first part is the HIVE/TEZ related code that is adapted to MR3, which is open source. I was able to fix the problem by porting a patch from the HIVE/TEZ community. The second part is the core code of MR3. This part of the code is not open source at the moment, but the community will respond with fixes for all non-Lecense related issues. We've been using it for 3 years and have received good feedback. For now, I think both large users with paid versions and small to medium users with free versions can solve their problems.
@BsoBird Thank you, Is it a better alternative to LLAP/Trino/Impala in case of response time? I've read their articles regarding its performance, but I'm curious about your exprience and use case. Did you used it for both ad-hoc/long running (minutes to hours) queries and interactive queries (sub-seconds), if not what other component you had to use beside it (any of Spark/LLAP/Trino/Impala)? Also did you already setup your cluster using Ambari (odp or bigtop) or used tarball installation, I mean is it straightforward to add MR3 to hive in any type of installation?
-- Is it a better alternative to LLAP/Trino/Impala in case of response time? yea. Compared to LLAP, it has the same performance as LLAP, but it is easier to install and configure. And it can provide better concurrency. Compared to Trino, HIVE offers more fault tolerance than Trino. For multi table join, it performs better than Trino. Compared to Impala, HIVE has a wider ecosystem than Impala.
--Did you used it for both ad-hoc/long running (minutes to hours) queries and interactive queries (sub-seconds) yea. one hive-llap/mr3, do ac-hoc and batch-etl. Because it provides a resource isolation solution. Users can significantly reduce the introduction of additional technology stacks.
--Also did you already setup your cluster using Ambari (odp or bigtop) or used tarball installation, I mean is it straightforward to add MR3 to hive in any type of installation? It can use the existing HMS, we just need to deploy a HiveServer2 service.
MR3-HIVE is a novel technology that can greatly improve the efficiency of user's HIVE-SQL execution without changing the way APACHE HIVE is used, even beyond trino. We have used it in a large number of production environments and received good results. As a large number of users are still looking for efficient HIVE engine and struggling, I think we should promote this technology to more users. So that they can benefit from it. https://www.datamonad.com/