clingen-data-model / genegraph

Presents an RDF triplestore of gene information using GraphQL APIs
5 stars 0 forks source link

Create a digest (hash) of a message payload #283

Open toneillbroad opened 3 years ago

toneillbroad commented 3 years ago

Integral to the new genegraph architecture, the transforming genegraph will identify new, updated and deleted records ina json wrapper on the public output data streams.

The digester will create a digest (hash) of the output message content.

If the output is a JSON-LD message, the digester must produce a digest that ignores the keys of blank nodes so as not to produce superfluous records on the output stream.

The digester will work in unison with a permanent key/value database store so as to determine the type of record as new, or updated. (RocksDB w k/v pairs was spoken of in the past)

Deleted records will have to be part of the message on the originating stream.

Deleted records will need to be removed from the digester data store.

Should be coded as an interceptor

Add to transforming genegraph interceptor chain

theferrit32 commented 2 years ago

Discussed including hash in output messages, either in wrapper object or in the kafka message headers field.

theferrit32 commented 2 years ago

For the clinvar VPT MVP we will create a snapshot stream with the data up to that point, not a live kubernetes pod continuing to produce new datasets as they come in. This means we can defer this issue until closer in time to when that live functionality is needed downstream.

theferrit32 commented 2 years ago

https://www.rfc-editor.org/rfc/rfc8785#name-open-source-implementations https://github.com/erdtman/java-json-canonicalization