Design doc for metrics collection

romanmarkunas-r3 commented 6 years ago

@gendal thanks for your comments!

1) I'd say that distinguishing between node/application level metrics is the contents of metrics and how much sense those metrics make for named level of abstraction. All application level metrics still have to be collected inside node. As the communication is peer-to-peer we cannot plug sniffers-meters in the middle (or we can, but it will still provide limited information and are huge operational pain).

2) Actually the whole document focuses on correlation of metrics collected inside node to a whole flow performance. And correlation ID between nodes is effectively session ID.

tlil commented 6 years ago

Any reason why this design doc can't be raised against the corda/corda repo?

ischasny commented 6 years ago

@tlil corda-solutions repo is for top-of-the-stack applications. I.e. the ones which might need a customisation per business network. However, I don't see any particular reason why this design can't become a part of the platform if it proves to be successful. @romanmarkunas-r3

tlil commented 6 years ago

Yep, this is something that we've been thinking about from an Experience and Production Readiness perspective as well you see, with deeper platform integration in mind. I've talked to @romanmarkunas-r3 and Dave about it. It's not for the short-term, but was thinking that putting the design in open source would be good to get more eyes on it and involve the community as well?

ischasny commented 6 years ago

@tlil corda-solutions will be open sourced (hopefully by the end of this month). solutions.corda.net and docs.corda.net will be cross-referencing each other.

tlil commented 6 years ago

@ivanr3 Okay, but we're trying to consolidate as much as possible in the open source repo, so if this is related to the platform as well, I don't see why we wouldn't just put it there to begin with. But if you think there isn't enough overlap at this stage, then that's fine. :) Just started thinking since we're planning similar/related work after GA.

@MarkOldfield What do you think?

ischasny commented 6 years ago

@tlil yeah, absolutely agree. We are purposely using the same tools as the platform team for the designs, docsite, build pipeline and etc. to be able to easily merge contents if required. I personally think this design might belong to the solutions site as the metrics collection might require some customisation from a BN to BN, so it might make sense to give network operators to give some flexibility to tweak the implementation without having to fork the platform etc. etc. The split is a bit vague at the moment. So we might end up with all the designs in a single repo in the future :)

ischasny commented 6 years ago

@romanmarkunas-r3 ^^

tlil commented 6 years ago

I see - okay. We can always move it later if needed. Thanks for the clarification, @ivanr3. I will take a closer look at this a bit later. Thanks :+1:

romanmarkunas-r3 commented 6 years ago

Here is a fork of Corda V3, that collects checkpointing duration: https://github.com/corda/corda/compare/release-V3...romanmarkunas-r3:MetricsIntegration

Please don't bother about code quality, it's a POC ;)

When using with our IOU example I added @Metered on flows and logging statements to display checkpoint length. Sample output:

[INFO ] 2018-06-21T13:51:04,319Z [Node thread-1] flow.[aadd33c2-4b5b-4782-9e74-54e01b13ddff].call - OTHER PARTY CHECKPOINTING TIME: 216 {}
[INFO ] 2018-06-21T13:51:04,319Z [Node thread-1] flow.[aadd33c2-4b5b-4782-9e74-54e01b13ddff].call - THIS NODE CHECKPOINTING TIME: 220 {}
[INFO ] 2018-06-21T13:51:04,319Z [Node thread-1] flow.[aadd33c2-4b5b-4782-9e74-54e01b13ddff].call - TOTAL FLOW TIME: 2379 {}

Modifications in CorDapp: https://github.com/corda/cordapp-example/compare/release-V3-contract-separately...romanmarkunas-r3:MetricsIntegration

dave-hudson commented 6 years ago

I couldn’t see the sample output?

From: romanmarkunas-r3 notifications@github.com Reply-To: corda/corda-solutions reply@reply.github.com Date: Thursday, 21 June 2018 at 15:09 To: corda/corda-solutions corda-solutions@noreply.github.com Cc: Dave Hudson dave.hudson@r3.com, Review requested review_requested@noreply.github.com Subject: Re: [corda/corda-solutions] Design doc for metrics collection (#4)

Here is a fork of Corda V3, that collects checkpointing duration: corda/corda@release-V3...romanmarkunas-r3:MetricsIntegrationhttps://github.com/corda/corda/compare/release-V3...romanmarkunas-r3:MetricsIntegration

Please don't bother about code quality, it's a POC ;)

When using with our IOU example I added @metered on flows and logging statements to display checkpoint length. Sample output:

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHubhttps://github.com/corda/corda-solutions/pull/4#issuecomment-399116608, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHrREWtvdTH4JkUV2fiClqyE6wdy_4ISks5t-6kJgaJpZM4UmDKP.

www.r3.com/email-disclaimer

corda / corda-solutions

Design doc for metrics collection #4