Open toneillbroad opened 3 years ago
Discussed including hash in output messages, either in wrapper object or in the kafka message headers field.
For the clinvar VPT MVP we will create a snapshot stream with the data up to that point, not a live kubernetes pod continuing to produce new datasets as they come in. This means we can defer this issue until closer in time to when that live functionality is needed downstream.
Integral to the new genegraph architecture, the transforming genegraph will identify new, updated and deleted records ina json wrapper on the public output data streams.
The digester will create a digest (hash) of the output message content.
If the output is a JSON-LD message, the digester must produce a digest that ignores the keys of blank nodes so as not to produce superfluous records on the output stream.
The digester will work in unison with a permanent key/value database store so as to determine the type of record as new, or updated. (RocksDB w k/v pairs was spoken of in the past)
Deleted records will have to be part of the message on the originating stream.
Deleted records will need to be removed from the digester data store.
Should be coded as an interceptor
Add to transforming genegraph interceptor chain