darkskyapp / node-sarra

consume data from Environment and Climate Change Canada
13 stars 3 forks source link

v03 heads up. #9

Open petersilva opened 4 years ago

petersilva commented 4 years ago

just to let you know, for the past year or so, we have been working on a new message payload format, as a result of limitations in the current ones and feed back from some international consultations. The current protocol version is identified by using the topic tree that starts with v02.post. Over the next year or two, we may migrate to v03.post. Differences in messages:

https://github.com/MetPX/sarracenia/blob/master/doc/sr_postv3.7.rst

might still evolve slightly (new fields?) but we have done some important deployments of v03, and it is looking solid. no fire... nothing will be sprung on consumers suddenly, we haven´t looked at any migration strategy yet, but would not want to spring it on clients all of a sudden. Figured you would want to know far ahead of time.

I can supply some alternate data streams if you want a sample.

ghost commented 4 years ago

Thank you! Is v03.post available for use right away, or should we simply get a branch ready for testing in the future?

ghost commented 4 years ago

In theory we can release this such that v03 is supported behind a flag to start, and eventually that flag becomes the default and finally that v02 is dropped.

ghost commented 4 years ago

I actually think I have most of this put together already. One question: what is the contentType of the new message format? (I assume it is application/json or application/json; charset=utf-8 or the like, but would like to confirm.)

I have a listener running on v03.post.# right now and I think all of my questions will be answered once I see a message come through :)

petersilva commented 4 years ago

if you connect to:

broker amqps://anonymous:anonymous@hpfx.collab.science.gc.ca exchange xs_pas037_wmosketch_public topic v03.post.#

You can get a sample v03.post feed. You can use it to confirm that v03 works, but we wont be posting anywhere else for a while (until v03 is fully gelled.) note that that feed includes embedding, which is a significant change from before.

This feed is extremely experimental and may change at any time. it is being used to work with colleagues in the WMO to develop next gen WMO data exchange protocols.

( https://github.com/MetPX/wmo_mesh )

petersilva commented 4 years ago

oh... the content type? In the C version we explitly set text/plain, in the python one there is no explicit setting, I'm guessing text/plain is the default, so basicaly we aren't using it. Do you think we should use a more specific content-type?

petersilva commented 4 years ago

note:

https://stackoverflow.com/questions/477816/what-is-the-correct-json-content-type

so if we went that way, it would be application/json ...

petersilva commented 4 years ago

another question... this protocol is fairly modern, so it is assumed as utf-8 in the spec. JSON is often utf-8. UTF-8 is kind of natural these days as a default charset (already the default on HTML5) so I don't think it is necessary to specify, it should be the default, and if someone wants to use something else, they should be the ones to use charset. I'm thinking about this in the context that I send millions of messages per day, so adding charset adds megabytes (12 bytes per message) of traffic per day, for no real benefit. but the question of content-type... what would be the benefit of appplication/json ?

ghost commented 4 years ago

Thanks for the test feed—I'll point to it and try to capture a message!

Regarding content type: if the message body is JSON, I'd say it's a best practice to set application/json. (This is mostly because it keeps the protocol as intuitive as possible. In our case, it's also handy, since it lets us use the same code to handle v02 and v03, since they return different message formats :) )

That said, if you're not using the field, I'll update our code to ignore it.

Noting the charset in the content type is by no means required and if you're concerned about bandwidth per message, omitting it is reasonable (especially for us-ascii, ISO-8859-1/windows-1252, or utf-8). (In fact, if the number of "why am I getting JSON parse errors" emails to our developer support mailbox is any indication, a lot of people don't pay attention to the content type even when told to explicitly!)

ghost commented 4 years ago

I haven't seen any messages pushed yet, so I can't verify, but in any case we have an experimental ironwallaby/v03 branch which should work.

petersilva commented 4 years ago

oops... the feed was down. it is back up now.

petersilva commented 4 years ago

OK, changed the content_type to application/json in master, will take a few weeks to get into a release, and perhaps a few months to get to production. At some point the messages will just start showing up with the right content_type.

ghost commented 4 years ago

Thank you! My branch seems to work now. The only issue I ran into was that the time format changed subtly (the addition of a T character to delimit dates and times). This was easy to fix, of course, but the change was so subtle that I didn't notice it in my scan of the documentation.

petersilva commented 4 years ago

great!

petersilva commented 4 years ago

what would be the protocol at this point, should I close the issue?

ghost commented 4 years ago

I still need to finish up and merge in my prototype, but I'll go ahead and close the issue once I've done so.

Would you mind opening a new issue once v03 is live (if ever)?

petersilva commented 4 years ago

There will certainly be an announcement on the datamart mailing list, and a period of parallel access (both versions available for a month or two) so you will certainly hear about it. The idea of the heads up is to minimize the length of the paralle period.

petersilva commented 1 year ago

update... why the heck didn't we release this three years ago? I spent a few years working with colleagues at the World Meteorological Organization, hoping to be able to merge v03 format with what they hoped to produced for pub/sub. It kept sounding like "yes, but..." and tweaks being needed here and there, but in the end, last spring they rejected it wholesale, preferring something more web service oriented, which has some conflicts with file transfer that is the focus of sarracenia.

After the split, the format has continued to evolve over the past nine months in the following way: for the high performance mirroring use case, we need to transport things other than files: file removal events, directory creation, symbolic links, renames... those were formerly encoded in a conceptual overload of the checksum field, but in versions of v03 since fall 2022, are now represented using a "fileOp" field. the format is shown here:

https://metpx.github.io/sarracenia/Reference/sr_post.7.html

There is also a likely removal of optional fields coming: from_cluster and to_clusters will likely be elided, as they have not proven useful in deployments so far.

The format will be the default for sr3... a version which has been gradually working towards a stable release for the past year, looking close.