TOSIT-IO / tdp-collection

Ansible collection to deploy the components of TDP
Apache License 2.0
21 stars 19 forks source link

hadoop-env.sh.j2 in two separate locations #770

Open PACordonnier opened 1 year ago

PACordonnier commented 1 year ago

hadoop-env.sh.j2 is stored in two locations in the repo

I'm not sure it is necessary ? From my understanding roles/hadoop/common/templates/hadoop-env.sh.j2 is exclusively used by hadoop/client. HDFS roles used their own hadoop-env.sh.j2

I think it makes sense to only manage one file (whether trough symbolic link or delete unused file)

rpignolet commented 1 year ago

If this file is rendered at the same location on target machine, I think it should only be templated by hadoop_client and deleted from hdfs role. We must check that hadoop-env content can be rendered by hadoop tdp_vars.

PACordonnier commented 1 year ago

That's the issue indeed.

hdfs's hadoop-env.sh.j2 uses vars such as hdfs_datanode_heapsize which are hdfs vars, not hadoop, and this would create an error hadoop_client_config.

rpignolet commented 1 year ago

This is always the problem with Hadoop which contains HDFS and YARN.

I think the hadoop-env needs to use hadoop variables from the tdp_vars to make the template happen at the hadoop_client level even though that contains the memory settings for HDFS components.

I don't understand what the java heaps configuration does in the hadoop-env...

Maybe we should discuss it at one of our meetings.

PACordonnier commented 1 year ago

It's getting worse, hadoop-env is actually in three locations since it's also in roles/yarn/common/templates :scream: Will update my recent PR

We can discuss it during meeting. Now that I investigate a bit further I think having 3 files, while not elegant, could actually be an adequate solution.