Comments from Jan Lindblad

JeanQuilbeufHuawei commented 1 year ago

Hi Benoît, draft authors, WG,

Thank you for the presentations (in several WGs) during IETF 118 and your great work around model driven telemetry. I have now read the latest version of draft-claise-opsawg-collected-data-manifest and would like to offer some comments.

This is really valuable work, and I will make sure to reference/use it in the next version of draft-lindblad-tlm-philatelist, which I generally feel fits quite nicely with this.

1) Controller level modules

I read the abstract and intro section of some earlier version of the collected-data-manifest work already last year, but it wasn't until this week I realized that this work is aimed at controllers. I think this fact is not mentioned in the abstract, nor anywhere in section 1 of the document. Many of the referenced modules (e.g. ietf-yang-library, ietf-subscribed-notifications) are device level modules, which IMHO makes it easy to misunderstand the proposed architecture. I'd suggest you clearly position this work as a set of controller level modules already in the abstract, even if it is already mentioned elsewhere if you read the entire document carefully.

2) Copy pasting from device modules

I see there is quite a bit of copy+pasting from device modules in this work. I understand why, and I would have done the same thing just to get my points across, but we need to do something about this in upcoming versions.

3) YANG to Time Series Database (TSDB) mapping

Appendix A provides a sketch for a mapping from YANG to TSDB tagged format. May I propose that we collaborate on the details for this mapping in draft-kll-yang-label-tsdb and that you refer to that document in lieu of appendix A?

4) Configuring the collection process

A "principle" I have proposed in the IAB e-impact program mailing list is that the (sustainability) telemetry collection should be entirely controlled by configuration. It should be possible for the operators/consumers of the collected data output to control (and transparently inspect) the collection process, and not embed/hard code the choices of what is included and not in code. Do you agree with this principle, and if so, would you have some thoughts about how the configuration framework I have proposed in draft-lindblad-tlm-philatelist could be merged with the platform and collection manifest?

5) Collection of the metadata

In section 5.2, it is mentioned that "We don’t focus on the timing aspect as storing both the data and their manifest in a time series database will allow the data scientists to look for the Data Manifest corresponding to the timestamp of the datapoint. In that scenario, the reliability of the collection of the Data Manifest is the same as the reliability of the data collection itself, since the Data Manifest is like any other data."

Could you elaborate a little on your exact ideas here? As I understand it, the main bulk of the data collection would be from a device to the TSDB. But the data manifest model would sit on a controller/collector, and not a device? So would the collector have a subscription on itself, or what exactly do you have in mind? Also, would this metadata collection process be granular, so that only actual changed leafs (e.g. period) is recorded, or would it record all data manifest values when any (e.g. period) changes? The example in figure 5 makes me think you might mean taking the entire thing each time anything changes. It seems to me the data manifest is potentially rather large, and if the period changes frequently, this could amount to a lot of data in the TSDB.

6) Size of the platform manifest

The platform manifest includes pretty much the entire yang-library. For certain devices, this could be a large amount of data. More than 1MB, I would guess. This data is sent to the TSDB once per system the collector is fetching data from. If that is from a few hundred devices (or much more), this metadata alone may land in the GB zone (or much more). Is there something we could do to make this scale a bit better? Maybe structuring the metadata differently could make it easier to reduce the repetition across devices that lands in the TSDB?

7) Wider applicability

Another of the "principles" I argued for in the e-impact mailer was that we should collect telemetry data from existing device interfaces (available now), rather than require and wait for new ones to be implemented in the real world networks. In practice, this implies collecting data also using other means than YANG-Push. I proposed some mechanisms for dealing with both the collection of data and metadata from such non-YANG sources in draft-lindblad-tlm-philatelist. Do you think we could incorporate some of those thoughts in the work here?

Thank you again for doing all this work and for sharing with the WG.

Best Regards, /jan

JeanQuilbeufHuawei commented 9 months ago

Point 1 addressed in e3567189dcab079630a240def31c810671c7ca72

JeanQuilbeufHuawei commented 9 months ago

Point 3 and possibly 5 and 6 addressed in 4fbe280

JeanQuilbeufHuawei commented 9 months ago

Still point 7 to address (at least)

JeanQuilbeufHuawei / draft-collected-data-manifest

Comments from Jan Lindblad #46