bitcoin / bitcoin

Bitcoin Core integration/staging tree
https://bitcoincore.org/en/download
MIT License
77.01k stars 35.58k forks source link

RFC: Instanced logs #30338

Open theuni opened 3 days ago

theuni commented 3 days ago

PR'ing this early as an RFC because it's a lot of work that will require constant rebasing. This is the first logical step and is mostly mechanical.

This PR adds an "instance" param to each log function and callsite. As is, there should be no functional change here.

The intention is to allow for a specific instance to be passed in each time we use the logger, rather than the implicit global as we have it now. Eventually (one subsystem/chunk at a time), this will allow us to instantiate a per-context logger, which is necessary for the kernel.

This is my second iteration. The first approach can be seen here: https://github.com/theuni/bitcoin/commit/533e75e81a18a06708bc3dfcee554d344cf1c81b. It was much cleaner and used c++20's std::source_location, however it proved to be impossible to do lazy variadic parameter evaluation that way. The approach here is rather dumb, but it should incur no performance penalty.

If there's a rough consensus on this approach, I'd like to go ahead and get this PR cleaned up and merged so that I can begin the work of converting global LogInstance()s to real instances. The first major goal would be to allow the kernel/bitcoin-chainstate to be used without the static global instance.

Note that the first commit will not compile. The second is a massive scripted-diff. If that's a problem, I can add a commit which introduces dummy macros, then switch them back after the scripted-diff.

DrahtBot commented 3 days ago

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

theuni commented 3 days ago

Ping @TheCharlatan @ajtowns @maflcko for Concept (N)ACKs

ryanofsky commented 3 days ago

@theuni I made a similar change in #29256. That approach might be less invasive because it makes the instance argument optional. So kernel code can be changed to log to a configurable instance while other code uses a global instance.

DrahtBot commented 3 days ago

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being incompatible with the current code in the target branch). If so, make sure to rebase on the latest commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/26668500153

DrahtBot commented 2 days ago

🐙 This pull request conflicts with the target branch and needs rebase.

ajtowns commented 1 day ago

this will allow us to instantiate a per-context logger, which is necessary for the kernel.

I don't understand where this is coming from. Libraries can export globals, eg stdin, so I don't see why you're claiming it's necessary for the kernel when it's not even necessary for libc.

TheCharlatan commented 20 hours ago

Re https://github.com/bitcoin/bitcoin/pull/30338#issuecomment-2194940918

I don't understand where this is coming from. Libraries can export globals, eg stdin, so I don't see why you're claiming it's necessary for the kernel when it's not even necessary for libc.

Being forced to eliminate all globals doesn't seem to be the claim here? As long as the globals don't carry over annoying state from previous usage, require annoying initialization, or aren't mostly safe from user misuse, their usage is fine. Next to what was brought up before about having the ability to distinguish between between the logged lines from multiple instances, there are a few other problems with the existing implementation that make its usage as a global annoying at the moment. If an instantiated logger is a red line here, we might be able to find other solutions though.

The current implementation buffers without a limit until StartLogging() is called. This forces the user to at least eventually call StartLogging(), even if they don't wish to consume logs. We could limit the buffer to some acceptably low amount, or ensure that StartLogging() is statically called when the library is loaded (similar to how the static secp context is created when bitcoin is started). This would discard previous logging messages though.

I also noticed that any logs buffered before the user applies their own user-defined options will be printed in the default format and with default categories. Re-starting logging is also awkward to do at the moment.

Logging is workable for the kernel library as is, but it would be nice to address the remaining pain points. I think having instantiated logging would be the best way to solve these issues, making it clear to the user when something will be logged, as well as giving guarantees on its format.

ajtowns commented 18 hours ago

Re #30338 (comment)

I don't understand where this is coming from. Libraries can export globals, eg stdin, so I don't see why you're claiming it's necessary for the kernel when it's not even necessary for libc.

Being forced to eliminate all globals doesn't seem to be the claim here?

The opposite of instanced parameters is globals; if instanced parameters are necessary, then globals aren't possible. The claim in the OP is that the "instanced logs" proposed here are "necessary for the kernel".

Next to what was brought up before about having the ability to distinguish between between the logged lines from multiple instances, there are a few other problems with the existing implementation that make its usage as a global annoying at the moment.

I assume that's referring to the concept of rewriting contrib/linearize from python to C++. That doesn't make sense to me as any kind of priority (and seems to conflict with other proposed benefits of instanced logging), so I'm going to assume that the "few other problems" are the real issue.

The current implementation buffers without a limit until StartLogging() is called. This forces the user to at least eventually call StartLogging(), even if they don't wish to consume logs. We could limit the buffer to some acceptably low amount, or ensure that StartLogging() is statically called when the library is loaded (similar to how the static secp context is created when bitcoin is started). This would discard previous logging messages though.

StartLogging is meant to indicate the appropriate configuration values have been applied to the logger; so calling it on library load wouldn't make sense, as far as I can see.

Limiting the buffer could make sense. We could also tweak the defaults to be more suitable for the kernel (eg, no buffering of logs prior to StartLogging() at all), and override the settings for bitcoind/etc with something like:

// logging.h
class LoggingLaunchSettings
{
public:
    template<typename Fn>
    LoggingLaunchSettings(Fn&& fn) { Logger logger = LogInstance(); fn(logger); }
};

// init.cpp
static BCLog::LoggingLaunchSettings bitcoind_logging_control{
  [](auto& logger) { logger.m_early_buffer = true; }
};

Other users of libkernel who wanted buffering could then do the same thing.

I also noticed that any logs buffered before the user applies their own user-defined options will be printed in the default format and with default categories.

Buffering debug/trace logging prior to StartLogging doesn't really make sense as far as I can see.

Fixing the formatting of buffered logs is a bit of work, but not very complicated. It doesn't seem particularly related to the kernel, though. Would you review a PR to do that?

Re-starting logging is also awkward to do at the moment.

I'm not sure what you mean? There's DisconnectTestLogger(); StartLogging(); ?

Logging is workable for the kernel library as is, but it would be nice to address the remaining pain points. I think having instantiated logging would be the best way to solve these issues, making it clear to the user when something will be logged, as well as giving guarantees on its format.

Touching every single line of logging code for a "nice to have" doesn't make much sense to me. It's nice to be able to just shove a LogInfo("XXX %s happened", thing); anywhere in the code and have it work, without having to find/add an enclosing manager object that has had a logging context passed in from init.

ryanofsky commented 17 hours ago

Touching every single line of logging code for a "nice to have" doesn't make much sense to me.

AJ I know we are disagreeing about a lot of things, and I do not fault you in any way for this, but your reactions seem very strong relative the actual scope of changes we would like to make.

We should set aside Cory's big scripted diff commit here, which I think is a proof of concept and an illustrative change, not an actual change being proposed to merge. I think the actual changes being considered would look a lot more like:

Which come from #30342 and yes, I can understand, probably you do not love. But I don't understand how they would have a discernible negative impact or why you would want to spend time arguing against them? These seem like basically trivial changes.

I am sincerely asking about this, because I do not understand what you see as being at stake here.

theuni commented 17 hours ago

Yes, @ryanofsky. This was a POC (FWIW I am actually very ok with the scripted-diff approach which touches every line, but I didn't expect it to be popular). At this point I'd prefer to throw my weight behind #30342 as it accomplishes what I want without being so invasive.

I won't add to the arguing here as I think you and @TheCharlatan have said what I would've anyway.

TheCharlatan commented 16 hours ago

Re https://github.com/bitcoin/bitcoin/pull/30338#issuecomment-2197436060

Thanks for going through this. Having something like LaunchSettings seems like a good thing.

Fixing the formatting of buffered logs is a bit of work, but not very complicated. It doesn't seem particularly related to the kernel, though. Would you review a PR to do that?

yes :)

I'm not sure what you mean? There's DisconnectTestLogger(); StartLogging(); ?

Besides the incoherent naming between the two, it also starts buffering again when calling DisconnectTestLogger(), which does not seem immediately intuitive to me. It seems like what these methods are actually doing is disconnect all callbacks and start buffering, and stop buffering and flushing the buffered messages respectively. I'm not saying this is bad, just that they should have better ergonomics that clearly convey what they do. The current state is workable as is, albeit a bit rough.

ajtowns commented 15 hours ago

Re #30338 (comment)

Thanks for going through this. Having something like LaunchSettings seems like a good thing.

Fixing the formatting of buffered logs is a bit of work, but not very complicated. It doesn't seem particularly related to the kernel, though. Would you review a PR to do that?

yes :)

Have a look at https://github.com/ajtowns/bitcoin/pull/9 and see what you think; obviously needs a little cleaning up before PRing properly. It also has a commit to add m_max_buffer_memusage so that log buffering doesn't blow up if you never call StartLogging() if you think that makes sense.

I'm not sure what you mean? There's DisconnectTestLogger(); StartLogging(); ?

Besides the incoherent naming between the two, it also starts buffering again when calling DisconnectTestLogger(), which does not seem immediately intuitive to me.

Not really sure what else you'd expect it to do? I don't really see why you'd ever "restart" logging rather than "reconfiguring" it while it's still running though, at least outside of tests.

ajtowns commented 12 hours ago

Touching every single line of logging code for a "nice to have" doesn't make much sense to me.

AJ I know we are disagreeing about a lot of things, and I do not fault you in any way for this, but your reactions seem very strong relative the actual scope of changes we would like to make.

I don't know what you want me to say here? I didn't say "THAT'S CRAZY YOU SHOULD ALL BE PUT IN AN ASYLUM"; I said "it doesn't make much sense to me". The only way to be less strong about my opinion is to be entirely silent...

Your patches are roughly how I imagine Cory's would eventually look, and they still touch every single line of logging code (in the kernel, anyway). That just seems completely unnecessary to me. The kernel library already works; I don't think logging needs major changes in order for it to not only work but work well* with the kernel.

But I don't understand how they would have a discernible negative impact

For instance, once we have LogInstance() removed from the kernel, so that you have to pass a logger down to any function that wants to log, how do we add temporary logging for development in kernel functions that don't have a manager object handy? Alternatively, I think your patches didn't touch sync.cpp despite that being obviously essential to the kernel and which does some logging in debug modes. I think passing a logger to it without just recreating the LogInstance() global would already be awkward.