RexOps / Rex

Rex, the friendly automation framework
https://www.rexify.org
716 stars 223 forks source link

file() function taking over 1.5 minutes to execute #1238

Closed sdondley closed 4 years ago

sdondley commented 4 years ago

I'm attempting to use the file() function in a module like so:

package My::RexTasks ;

use Rex -feature => ['1.3'];

sub server_connect {
  my $server = shift;
  Rex::connect(
    server => $server,
    user => 'me',
    public_key => '/Users/me/.ssh/id_rsa.pub',
    private_key => '/Users/me/.ssh/id_rsa',
    sudo => TRUE,
  );
}

sub upload_file {
  my %parameters = @_;
  my $subdomain = $parameters{subdomain} || '' ;
  my $domain = $parameters{domain};
  my $server = $parameters{server};

  my $config_file = ($subdomain ? $subdomain . '.' : '') . $domain . '.conf';
  server_connect($server);

  die 'Must pass --domain value. Aborting' if !$domain;
  file "/etc/apache2/sites-available/$config_file",
    owner => 'root',
    group => 'root',
    mode => 644,
    content => template('templates/d8_vhost.tpl',
      conf => {
          subdomain => $subdomain || '',
          domain => $domain,
      },
    );
}

It works, but it's very, very slow. It can take about 1.5 minutes before it executes. I don't have this problem when running file from within a task function in a Rexfile. Perhaps I am not using the Rex::connect function as intended? Can someone point out what I might be doing wrong?

sdondley commented 4 years ago

I've tracked the problem into the bowels of the Rex::Helper::System::info method and Rex::Hardware::get method. The modules keep looping over getting the Network, Host, Kernel, Memory, Swap properties about 4 or 5 times (apparently over a network connection) while trying to build the template that is passed to the file method.

I'm not sure if this is a bug or I'm just misusing Rex.

sdondley commented 4 years ago

OK, so 90% of the looping is generated by this block of code in Rex::Commands::File::template method.

  if ( Rex::CMDB::cmdb_active() && Rex::Config->get_register_cmdb_template ) {
    my $data = Rex::CMDB::cmdb();
    for my $key ( keys %{ $data->{value} } ) {
      if ( !exists $template_vars{$key} ) {
        $template_vars{$key} = $data->{value}->{$key};
      }
    }
  }

This block appears around line 229 in the file. I'm not sure what this code does yet, but if I comment it out, the code in OP completes in a much more reasonable amount of time.

sdondley commented 4 years ago

Digging a little deeper, the slowdown occurs in the call to Rex::CMDB::cmdb which calls Rex::CMDB::YAML::get which calls Rex::CMDB::_parse_path once for each file in @files which contains the following list of files:

[
  'cmdb/{operatingsystem}/{hostname}.yml',
  'cmdb/{operatingsystem}/default.yml',
  'cmdb/{environment}/{hostname}.yml',
  'cmdb/{environment}/default.yml',
  'cmdb/{hostname}.yml',
  'cmdb/default.yml'
]

_parse_path calls Rex::Helper::Path::parse_path which calls Rex::Commands::Gather::get_system_information which accounts for why the server is repeatedly queried for its properties.

It smells like a bug but I don't know enough about Rex to say for sure. It could still be I'm just not using it properly. But this is as far as I can go. Hopefully this information helps someone.

ferki commented 4 years ago

I'm not sure if a GitHub issue is the best place for more generic support threads, but I'd like to thank you for the detailed information, @sdondley!

While it's possible, using Rex as a library is not the intended primary usage so I'm afraid not much things are tuned or even tested for this kind of usage at this moment. My hunch would be lack of caching for those exact things you mentioned. Caching was enabled by default in v0.46.0, but I guess some parts of the whole story is not being loaded if Rex is used as a library, so the caching feature flag is not getting triggered.

I can't fully reproduce your use case locally yet, bu I think the performance parts can be helped as easy as adding Rex::Config->set_use_cache(1) to your code. It should enable caching internally all the remote system discovery and templating parts among others and that should speed up things.

However CMDB caching is not yet implemented, but that would be a good idea to add, I'll open an issue about it. It probably won't solve the performance problems in your use case anyway as the time is spent in discovering the remote system information over and over again.

If you can, please join us on IRC to further discuss these topics, but here's some more random tips about your use case:

You might also be interested to take a look at Rex::Inline for similar usage patterns. Please note that is a third-party module on CPAN (i.e. not maintained by or affiliated with RexOps), and I do not have experience with it, but it claims to be doing exactly what you are trying to do.

And you might want to add $Rex::Logger::debug = 1; for debug output, and/or obtain some profiling info to check what's taking so long (Devel::NYTProf comes to mind).

Let us know if any of those helps!

sdondley commented 4 years ago

Perfect. Rex::Config->set_use_cache(1) did the trick. Your advice is very good timing. I thought it had something to do with caching and I was trying to figure out how to enable caching but without success. Thanks!

And I had just stumbled Rex::Inline about 20 minutes ago. Funny! I will definitely check that out. I'll see if it still works with more recent versions of Rex and report back.

Perhaps a cautionary note that using Rex as part of a library is not fully supported should be added to the documentation at https://metacpan.org/pod/release/JFRIED/Rex-1.6.0/lib/Rex.pm? The POD talks about using Rex as part of a library.

ferki commented 4 years ago

I'm glad it helped! \o/

I wouldn't say this not supported categorically, but I have to admit this is the first direct "as a library" use case I have seen in the wild in the past ~7 years with Rex. Also great timing on your part, as I'm currently working on a talk about "unexpected use cases" ;)

Thanks for the additional pointers too. I agree it's a good idea to add some extra note to the POD of Rex.pm (mention "YMMV", hint about caching in the code example), and perhaps even write docs and/or an article about using "Rex as a library" (so others can start from a better point).

Those are probably best to be handled on their own issues in this or in the RexOps/rexify-website repos, but I wanted to note them down as reminders now.

sdondley commented 4 years ago

What is your recommendation regarding using Rex inline vs. running shell commands from within a perl script (e.g. `rex 'upload_file'`)? I was experimenting with doing things inline just to see if could be done. But maybe it's less headache to just throw the rex commands into a script and simply pass arguments to the tasks using the parameters feature.

My use case is wanting to set up and configure websites/apache/bind/mysql automatically.

ferki commented 4 years ago

Well, there's no silver bullet, but Rex is a framework which allows you to build whatever you think is fitting your use case the best. It's best to start simple and gradually add more and more automation and structure to a project.

In general I like a data-driven, push-style, declarative approach, so I generally end up with the following bits for mid-sized projects:

To separate logic/code and configuration/data, I use a mix of modules and CMDB. Tasks are written in an idempotent way so they can be run anytime, and second run is usually just a no-op if there were no state changes since the last run.

I'm writing modules under lib/Rex/<project>/<component>, like Rex::Company::Apache. These modules usually have a setup task for installation and a configure task for configuration (and running), and sometimes a destroy task to uninstall, and/or any component-specific helper tasks if needed.

In the Rexfile, I mostly deal with project-wide specifics, like groups, hosts, authentication info, CMDB setup, and requiring all the modules from above. Some top-level project-wide tasks may go directly here too, but sooner or later I tend add a converge task which just calls all configured bits on all hosts infrastructure-wide.

Of course smaller projects can be simpler than that, maybe just a bigger Rexfile. Bigger projects tend to be split up across multiple repos, depend on external data sources for inventory/CMDB, use pull- or hybrid-style management, and/or introduce job distributors/queues to deal with retries (in case of errors or maintenance).

mestia commented 4 years ago

That's an interesting approach. Is it mentioned anywhere in Rex docs? I have feeling that the missing/incomplete docs is the weak point of Rex.

ferki commented 4 years ago

That's an interesting approach.

I'm glad you find it interesting.

Is it mentioned anywhere in Rex docs?

No, instead we do actively "brand" Rex along those lines (framework aspect, it's just Perl, etc). A somewhat related idea is to have more like a cookbook of practical examples to show various real-life use-cases and alternatives, instead of "this is how you should do it with Rex". But if I'm the only one writing about these, then it will only reflect my opinions :upside_down_face:

I have feeling that the missing/incomplete docs is the weak point of Rex.

Well, documentation can always be better, and we're really hijacking the original topic here. I'd be happy to continue discussion on better forums (IRC?).

I'm closing this issue now, as it was more like a support request than an issue to fix anyway.