facebook / chef-cookbooks

Open source chef cookbooks.
Apache License 2.0
571 stars 140 forks source link

fb_helpers contains namespace collisions with official chef node objects #211

Open erikng opened 2 years ago

erikng commented 2 years ago

In chef v17, chef added a lot of built-ins and we are seeing instability with at least one of them

https://github.com/facebook/chef-cookbooks/blob/main/cookbooks/fb_helpers/libraries/node_methods.rb#L146

when we call debian? within a library function, it returns false, so we are now having to do debian? || ubuntu? to workaround this as we consume fb_helpers.

To put this nicely, is Facebook willing to stop doing this? Or at the very least, stop putting so many of these things in node and put it to a FB namespace?

jaymzh commented 2 years ago

First, I'm not sure why these would cause instability - those methods are in the DSL namespace and not on the node, and should in no way conflict.

Second, yes, once FB is fully on Chef 17, it will likely make sense for them to code-mod away from the node methods to the DSL methods, and then deprecate these and remove them, but that's not going to happen over night, so we should figure out why you're having issues with these.

erikng commented 2 years ago

We agree on the confusion around DSL/Node. We are also seeing some weird behavior with shell_out on similar helpers but didn't want to discuss two issues at once.

Both are quite confusing though.

jaymzh commented 2 years ago

I use shell_out a lot in C17 and have had no issues.

If you wanna provide a working example of how debian? doesn't work as expected when fb_helpers is defined, but not otherwise, I can certainly look into it

erikng commented 2 years ago

I need to figure out how to distill it down as the code is deep in our internal libraries. And yeah what's weird is the shell out work just fine on macOS, but not Ubuntu 20. It's very puzzling.

For now we are instead calling debian_family? which doesn't seem to hit this issue.

jaymzh commented 2 years ago

That ... would be the expected behavior. debian? won't match Ubuntu 20 but debian_family? would.

erikng commented 2 years ago

sigh

I'll close this ticket and create another one once I figure out more info on the shell out.

erikng commented 2 years ago

We should have looked at this more closely before pointing fingers so apologies about that.

erikng commented 2 years ago

So actually I want to come back to this (feel free to close if you disagree)

chef (17.8.25)> node.debian?
 => false 
chef (17.8.25)> debian?
 => true 

so this is where the confusion happened internally

let's say you did something like this in a resource

cpe_test/recipes/test.rb
file '/tmp/blah' do
  only_if { debian? }
end

It would trigger. But what if someone did something like this

cpe_test/recipes/test.rb
file '/tmp/blah' do
  only_if { node.supported_os? }
end

cpe_test/libraries/node_utils.rb
class Chef
  class Node
    def supported_os?
      debian?
    end
  end
end

There is some level of mental gymnastics here and the solution is not apparent

cpe_test/recipes/test.rb
file '/tmp/blah' do
  only_if { node.supported_os? }
end

cpe_test/libraries/node_utils.rb
class Chef
  class Node
    def supported_os?
      ChefUtils::DSL::PlatformFamily.debian?
    end
  end
end

I'm sure I could distill the issue down further, but hopefully you see the point. With the built-ins in Chef 17, the behavior is different from chef's and facebooks.

:: Edit I also figured out the issue I was seeing with shell_out. Ignore that :)

erikng commented 2 years ago

This also impacts things like

chef (17.8.25)> node.serial
[2022-02-17T12:25:48-05:00] WARN: ubuntu is not yet supported by node.serial
 => nil 

and the logic actually works just fine except debian? is what's being rejected

erikg@erikg-VMwareVirtualPlatform:~$ sudo /usr/sbin/dmidecode -s system-serial-number
VMware-56 4d ca 0e 2b de 73 27-31 4b 67 81 d1 83 4f ec
jaymzh commented 2 years ago

I think you see this , but just to be clear, if you call debian? within the node, then the debian? within the node is the one in scope. If you call debian? not within the node, then it's not. That's just ruby scoping.

I would say something like "if you want the node ones, make your helper functions in the node otherwise make a helper class you pass the node into"

But, if you want to use the equivalent of debian? in DSL, ti's actually debian_platform?. I effin' hate that they made xyz? be "family" and made xyz_platform? be the more specific one. Seems so counter-intuitive, but that's what they did.

So what's really happening here is two things:

  1. the FB versions do X = x?, X_family = x_family? while the Chef versions do X_family = x?, X = x_platform? so picking the right one as you transition is hard
  2. If you're in the node object, which x? you get changes compared to everywhere else.

Getting out of this is a bit tricky because of the way libraries are loaded. One way I can see this working would be:

  1. Move all the platform helpers to a new cookbook, and in the short-term, make fb_helpers depend on it, so the behavior doesn't change
  2. Move all the stuff in fb_helpers that uses those platform helpers to the DSL ones, using FQ paths (ChefUtils::DSL::PlatformFamily)
  3. Have some announced time we'll move away from including the fb_platformhelpers from fb_helpers. No version bump, sorry. But we can give people some time. In the meanwhile, FB will code-mod away from the node helpers, and add a linter to warn against them
  4. Stop including fb_platformhelpers.

That would work, but it's a lot of effort and I'm not sure there's bandwidth for it right now on that team. It also has a HUGE risk that if some random cookbook depends on fb_platformhelpers it can break everyone who's transitioned. This scares me.

Another option I could see (but I don't like either) is to make a new set of helpers in the node that are dumb wrappers around the DSL ones:

def dsl_debian?
  ChefUtils::DSL::PlatformFamily.debian?
end

and so on. Then everyone can code-mod over to that at their leisure. Then at some point we can drop all the existing helpers. Once we do that everyone can do s/node\.dsl_//g on their code base and they'll be using the DSL helpers. That might be a bit easier, but it requires us to keep up with those helpers at least for a while. But it's also less dangerous, I think.

erikng commented 2 years ago

The latter option is something I was thinking about but yeah, none of this is ideal :/

dafyddcrosby commented 1 year ago

Meant to chime in on this one (I thought I already had, sorry!). The wrapper method approach is probably the one we'll end up taking, though not sure about the codemod aspects of removing node. yet, as the node object does make inferring what's going on easier when doing static analysis. We might make a Cookstyle lint to catch direct (ie non-wrapper method) use of ChefUtils::DSL, so that there's a consistent way of handling this problem.

jaymzh commented 1 year ago

Well, the non-node varieties are mixed into the Recipe DSL, so they're easy to reason about. No different than file, or directory or anything else in the DSL.