DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

open source and strategic software development to speed things up #105

Closed eighthave closed 4 years ago

eighthave commented 4 years ago

What is clear is that this protocol and software needs to be developed, tested, audited, and deployed as fast as possible. Given strategic software development practices, this can be sped up a lot by allowing any skilled party to work on any aspect of the project with minimal coordination and in parallel. This is a proven model in many free software projects, like Debian, Linux, etc. This project will also require a number of specific skills that one person or even team of people is likely to have, like epidemiology and deploying bluetooth apps at scale.

Specifically this means:

Many devs with needed expertise are looking for projects to contribute to. The steps outlined there will provide a path to get more people working towards the same goal.

dirkx commented 4 years ago

@eighthave - I've started to write down what people did in the various test and demo versions on github and slack - and captured it at https://docs.google.com/document/d/1__-lGTCt7nXcpC0JnmHYHVJpLSAQHNQEcknkyOvKYYU/edit

snakehand commented 4 years ago

We could maybe also specify some intermediate architectural components. I am in particular thinking about Redis ( key value store ) as an intermediate storage of Ephs that should go into the Cuckoo hash. By using an out of the box high performance industry standard component as a vital hub, the scale-ability of the overall system can be better assessed. Multiple front end servers can then push into the a central or distributed key value store, and the compression to the cuckofiltred file can be done as a batch job in the background that generates a static file that the front ends can serve as it becomes ready. This also makes it easier to do overall system integration of disparate components, and tcomponents can also be changed out over time causing less disruption.

dirkx commented 4 years ago

@snakehand I've been running some performance tests (baseline here: https://github.com/dirkx/DP-3T-Documents/tree/editable-version/impl/design-2-openssl-C; will try to add the last version) - and I am not sure that that is even needed.

E.g. a simple flat file (one per day) or a dead normal database can very easily cope with this - and the creation of the Cuckoo Filter is so fast that you pretty much can do it every time a new infection thing comes in (even if you would have to re-create it with all the other known infected seeds).

It is in the sub second range for 500k-1M range on old hardware.

But agreed that it WOULD be very useful I think to create a reference architecture of sorts that shows how you can de-compose. Because if you need to implement some of the `handwaving' in the paper you quickly get things like a caching layer; some URI scheme for day names & perhaps appending an RFC3161 signature to the published file.

Do you want to make a first stab at it ? Or what format is easy for you if I sent you what I have ?

snakehand commented 4 years ago

I was thinking to implement or use one of the available cuckoo filters in Rust ( though xorfilter seems to be more in vogue : https://arxiv.org/pdf/1912.08258.pdf ) The point with using Redis is to standardise some of the plumbing in order to ease integration. Any scheme / formatting suggestions you might have will be interesting. The serialisation of the CF should also be standardised. If I make a Rust library for this it would handle both server and client needs, as well as the (de)serialisation, but the exchange format needs to be both compact, well documented and implementation independent.

dirkx commented 4 years ago

I am not overly concerned - for this set of data (100k-Millions infected per sensible area; 15-30 days, etc) - it is a wash. And having the non-reverse/back-calculate properties well understood is more important than the last few percent performance. It seems fast enough on an iPhone 4 with a few million hits & smaller than the homepage of CNN in download.

So am trying to get a fully working version of serialisation going (put a strawman in above https://github.com/dirkx/DP-3T-Documents/blob/implementation-profile-start/implementation-profiles/profile.md file). Will commit the code (and update doc if I have to) once that is done later today or tomorrow.

Would be lovely if you had a rust one and we could get the two to interoperate.

That would go a long way towards proving that the spec is written well enough and all that.

kennypaterson commented 4 years ago

@dirkx I think it's important to emphasise that the whitepaper we have released is just that - a whitepaper, not an interoperability spec. Understanding this may help you decide how much effort you should invest in code development right now. We hope to have a first implementation from the DP-3T project available soon. We will then very much welcome your comments on it.

dirkx commented 4 years ago

That is totally understood - and totally fair game!

However - the rest of the world has a use for this - and often it is very possible to do these things fully in parallel.

So that once the first implementations come out - they are both interoperable and at least one has been against a description rather than shared heritage (as per the IETF rules for good standards).

kugelfish42 commented 4 years ago

I guess this might be somehow relevant: https://www.apple.com/covid19/contacttracing/ ?

eighthave commented 4 years ago

And Google too: https://blog.google/inside-google/company-announcements/apple-and-google-partner-covid-19-contact-tracing-technology

BlueTrace has open sourced a reference version of their TraceTogether app and server-side: https://github.com/opentrace-community

The Austrian Red Cross' Stopp Corona app is being analyized by epicenter.works: https://epicenter.works/document/2465