NOAA-PMEL / iot-data-landing

IoT Data Landing Project
The Unlicense
0 stars 0 forks source link

Use git-crypt to encrypt key files, add script to bootstrap devices #11

Closed kwilcox closed 2 years ago

kwilcox commented 2 years ago

In order to git-crypt unlock the repository I'll need to add each collaborator's GPG key to the repository. Please create a GPG key, add it to your local keyring, and associate your email address with it!

Creating a GPG key: https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key

Associating your email with the key: https://docs.github.com/en/authentication/managing-commit-signature-verification/associating-an-email-with-your-gpg-key

derekcoffman commented 2 years ago

@kwilcox I've never used git-crypt before. I have a GPG key associated with my email and added to my github settings. Is there anything else I need to do?

kwilcox commented 2 years ago

@derekcoffman Did you get git-crypt setup and working?

derekcoffman commented 2 years ago

@kwilcox I installed it and am able to un/lock the repo so I think I'm good to go. Next step is to actually create some "things". Currently, the mock sensor sends all the data over a single topic so the thing would be the DAQ id. I can modify the sender to send each sensor (make-model-sn) on its own topic if that works better for testing. What is your preference?

kwilcox commented 2 years ago

A "thing" sending its initial data to its own topics will be "the way".

The permissions are open ended. It is tough to read that link because AWS required the nested JSON string... but the "thing" has permission to publish to: topic/${iot:Connection.Thing.ThingName}/*

We could standardize on MQTT topic names for each "thing" regardless of if its on AWS or not:

${iot:Connection.Thing.ThingName}/pub/raw/
${iot:Connection.Thing.ThingName}/pub/qc/
derekcoffman commented 2 years ago

Sounds good and makes total sense. It makes it all very flexible. And I definitely think that standardizing on some topic names would be very useful.

As you said in the meeting, this pretty much vendor agnostic which is great. There will always be the vendor specific details for connecting, etc. but the structure is easily moved between platforms. Also, the standardized topic naming conventions can be used in whatever messaging system we might use on the backend.

As for sending mock data, I think I would like to add a second mock data source to simulate another platform/data format type. So we'd end up with at least two "things" with the option of adding multiple extra of each to test. Each "thing" would be sending data for multiple sensors which could be handled (parsed, qc'd, saved) on the back end.

Sound reasonable?

kwilcox commented 2 years ago

Sounds reasonable! I was also expecting each "thing" to be named after its "make-model-sn".

derekcoffman commented 2 years ago

My thought process so far...

DataSource -> thing where the Data Source can be expressed with some combination of the following:

And using the standardized topic naming conventions it would give each group a defined framework but also allow for individual data pipelines once the data get to the lab.

kwilcox commented 2 years ago

I poked around the different IoT offerings and AWS seems to have the most restrictive naming convention... it can't be over 128 characters. Let's try not to cram too much info into that name, just something that is universally unique.

We can have a system that catalogs all of the "things" and assigns metadata like group/project/owner (and other affiliations)/etc. It could just be a JSON file for each "thing name" for now. Thoughts?

I like the idea of versioning in the actual topic name but not sure of what the trade-offs might be down the road. Are you suggesting something like this?

${iot:Connection.Thing.ThingName}/pub/raw/v0001
${iot:Connection.Thing.ThingName}/pub/raw/v0002
...
${iot:Connection.Thing.ThingName}/pub/raw/v000N
derekcoffman commented 2 years ago

I had been thinking of the versioning in terms of data formats which can be kept in the cloudevents "type" (or custom) metadata. But if it makes sense to put it in the topics I would be good with that.

And agreed on cramming too much into the name. Just enough to make it unique makes total sense.

derekcoffman commented 2 years ago

Here are a few possible data sources to consider

Anyway, just some thoughts on that. And, there's no reason is couldn't be a combination of daq type of datasources as well as individual instrument/sensors if that's how some groups would rather manage the data.