dwyl / technology-stack

🚀 Detailed description + diagram of the Open Source Technology Stack we use for dwyl projects.
288 stars 26 forks source link

Extending "PETE" with Reusable Components So We Can Build People-Centric Standards-Compliant Data-Driven Apps Easier/Faster/Reliably #67

Open nelsonic opened 6 years ago

nelsonic commented 6 years ago

Context 🤔

We have been using the "PETE" Stack for the past 18 Months and it has been a really good experience for end-users (people using the apps we've built), developers and product owners/clients! We love Phoenix and Elixir and feel it's been a good choice of framework and programming language.

My/our one "regret" is not pushing for more modularity earlier on in our journey with Phoenix, which has meant that we have "locked up" most of the useful functionality in the Apps we have built and thus have to "re-implement" lots of "boilerplate" each time we start a new project. 😞

Example 💊

A good example of the PETE stack is Healthlocker https://github.com/healthlocker/healthlocker The end-users like using and is much better than anything the NHS had before (for this use-case). 🥇 Everyone who worked on the project can be proud of the work that has been done. 🎉

However if you look at the mix.exs file, you will see that while we used several open source modules created by other people, we did not create any reusable components/packages/modules ourselves ... That's "OK" because it was not the (stated) "goal" of the project to "create reusable code", the goal of the project was to create an App that helps people "self-manage their [mental] health" and "effectively communicate with their clinicians".

However it was an aim of the project to be Open Source so that anyone could contribute! And while we "checked the box" of Open Source by making the project public on GitHub, we have not had any contributions from anyone outside of the "core" team. This raises a separate question around what constitutes "success" in Open Source: https://github.com/dwyl/technology-stack/issues/66 ...

Our hypothesis is that if we had created Healthlocker as a collection of reusable components which then get assembled into an App, we could have aided the "core" goal of the project because by making reusable components, each time one of those components is reused and improved, the original project receives all the improvements "for free".

By creating Healthlocker as a one "monolithic" repository (AKA "monorepo") we were able to move fast(er) initially and deliver the App to users/stakeholders quicker ... 🐎 but now when we need to implement Auth, Permissions or Chat in a new project 🆕 we have to do it all "from scratch" ... ⏳

Analogy: Smart Phones 📱

Think about the Smart/Mobile Phone in your hand/pocket, it's made up of millions of components. The software stack that goes into making the smartphone possible is billions of lines of code. Each time one of the elements in the software stack (Kernel, APIs, UI Libraries, Build Tools) improves, everyone in the ecosystem (developers, users, companies) benefits from the improvement. If the stack of software for the smartphone was a "monorepo" it would have failed long ago; because it would have become unmaintainable and no individual person would be able to understand it!

Note: For this analogy, focus on the software and ignore the fact that for the most part the individual hardware components (screen, battery, processor, camera etc.) are not upgradeable. 🙄

Lessons Learned 📚

We have all learned a lot from our work over the last few years. A few lessons

Proposal 💡

Our mission for the rest of 2018 is to create the reusable stack we will be deploying in 2019:

dwyl-technology-stack-2019

Diagram: https://drive.google.com/file/d/1gfvgK1Mn7EuamHJroS8G7FUqauRgA9EH Note: the colors of the chevrons are not significant. they are merely to help differentiate between the different elements in the stack. If anyone wants to make this look better, please go for it!

If this feels like "a lot", don't be discouraged or overwhelmed, we already have a massive head start! A lot of the work on this has already been started and in some cases shipped! Break it down into "chunks" and start from the bottom: the Append-only Log is the core of all apps. see: https://github.com/dwyl/phoenix-ecto-append-only-log-example (if you haven't already)

@Danwhy is already making great progress with Append-only Log: https://github.com/dwyl/alog Both @Cleop and @RobStallion have made great contributions to https://github.com/dwyl/autoform and we are already actively "dogfooding" elements in our client projects!

We'll build a "minimalist" analytics system "from scratch" on top of alog: https://github.com/dwyl/atm We have done some work towards this in: https://github.com/dwyl/hits

Auth will need to be assembled from the ground up based on alog, fields and autoform. So for now, ignore the work that had been previously done.

Once we have the "base" layer of alog, Fields, ATM, Autoform, Auth and Admin, we can build the more "feature rich" tools: Contact and Feedback.

Hopefully this will clarify the "strategic" direction we are taking when building our most recent projects in a modular way.

Name? 💭

For now, we don't have a name for the collection of modules/packages.

Open to suggestions as to what we should call this collection of modules.

I'd love to call this the "Obvious Stack" because most of these things are obviously useful to most apps we have (already) built and will give us a massive head start for anything we build in future.

Considering "repurposing" https://github.com/dwyl/abase as it was the "spiritual grandfather" of this. "aBase" was built and used used for a couple of Node.js Apps in 2016/17 but not developed further, as transitioned to Phoenix in early 2017. It's not the best name (mea culpa), but at least it conveys the meaning of what we are building.

Feedback / Thoughts / Questions / Discussion ? 💬

As always your feedback is very much invited/requested/appreciated! If you have ideas, thoughts, questions, concerns, please share!!

Thanks! ✨

Related to: https://github.com/dwyl/learn-elm/issues/121 (Creating Elm Packages) ❤️ and https://github.com/dwyl/hq/issues/497 (Learning Goals). 🎯 Required by: https://github.com/dwyl/feedback/issues/96 (Feedback Widget) and https://github.com/dwyl/product-roadmap/issues/7 (Metrics)

iteles commented 6 years ago

Our hypothesis is that if we had created Healthlocker as a collection of reusable components which then get assembled into an App

This. This is a huge lesson from the work we have done over the past 2 years. Our small modules (which we barely use ourselves) attract a fair amount of attention and contributions - the highly modular approach makes a lot of sense.

Having now done this on our latest two client projects, this way of working makes my heart sing ❤️ 🎶 It's better for us, better for our clients and better for the open source community.

Thanks @nelsonic for the proposal diagram too, having it so clearly laid out is extremely useful!

Cleop commented 6 years ago

Thanks for writing this up @nelsonic, I'm enjoying contributing to its progress and looking forward to seeing it take shape!

RobStallion commented 6 years ago

@nelsonic This is awesome. I agree wholeheartedly. ❤️ ❤️

Our hypothesis is that if we had created Healthlocker as a collection of reusable components which then get assembled into an App, we could have aided the "core" goal of the project because by making reusable components, each time one of those components is reused and improved, the original project receives all the improvements "for free".

This point especially rings true for me. Although I did a lot of work with healthlocker, when I have needed to come back to it to make improvements/fix bugs/add a new feature, I find it time consuming and honestly a little daunting. The great thing about the modular approach and components is that when we need to upgrade a component, everywhere that component is used is also updated.

Having to manually find and update everywhere in an application that you have done something specific can be time consuming and error prone.

I really enjoy the modular approach. There a plenty of reasons for this, the top few being that it saves time on a project if we can reuse something other developers have already built (and we can use that time on the projects rather than reinventing the wheel) and code is in smaller, more manageable chunks (and often more robust as many people have thought long and hard about the best way to implement that one thing).

My favourite reason though is that I feel like the code is much more accessible. What I mean by this is that when it comes to big monolithic projects, they could have (and often do have) plenty of great functions and modules that have been built into them but it is hard to know what they will have and where to look for it. With the modular approach I feel it is much easier to find exactly what it is that you are looking for, apply it to your use case and (maybe most importantly) make suggestions for how the code could be improved/enhanced.

SimonLab commented 6 years ago

Agree :+1: if we are able to split applications in multiple reusable components, the creation process will be much faster and enjoyable. For example rewriting the authentication logic for each new application is time consuming and tiring!

However I'm wondering if any features can be implemented with reusable components or do we need to have a limit on the kind of projects that can be implemented with modules?

There is a risk to try to adapt the reusable code to match some specific features of a project. What would be an acceptable solution, accept any PRs in the reusable modules? This might pollute the API and the modules might not be as easy to reuse. Fork the modules to make them specific enough for the application? This will break the idea of reusable component. I had this issues on previous project where small PRs where added to some npm packages to just be able to add/update some features on the project. So to avoid this we want to make sure from the start that all the features can be implemented easily with the components or we need to direct and think of new features based on what the components can do instead of letting the product owner decide/impose what they have in mind.

It's difficult for me to have a precise idea on how everything will work together, but I like this macro vision on how to build our next projects. A simple example which demonstrate how to use the "PETE extended" would help us to see what are the limitations but also confirm our motivations to go in this direction.

nelsonic commented 6 years ago

@SimonLab good question(s)! Provided we have a definition of all types of data in fields we can build anything! (I don't think we need to impose a "limit" yet. Unless you can think of an example ...)

We can discuss specific field types in issues in fields, for example: postcode. What should the validation for the postcode field be? Should we have a RegEx that matches UK postcodes or USA? Should fields receive the browser's Language String e.g: EN/US in order to determine which RegEx to use? or should we be able to pass a custom validation to fields to make it more flexible? We can discuss this on a case-by-case basis as the need arises; we don't need to "waterfall" the whole thing thing up-front.

The only risk I see is in defaulting to the "old habits" of building all the functionality into the "Client App" instead of being disciplined and putting the functionality in a reusable package.

We can only build a simple example once the alog and fields are in a useable state. The "simple" example is going to be "ATM" which will be a useable Analytics package we will use to make "Hits" see: https://github.com/dwyl/product-roadmap/issues/7 a "real world" App with real users. From there we can iterate "up the stack".

Yes, we've done things a little bit "in reverse" with our latest client project because fields is not (yet) built so we are not using it (yet), but that will change v. soon.

Our first/next step is to make the docs for https://github.com/dwyl/alog super beginner friendly. We have a few good examples of this in @dwyl e.g: https://github.com/dwyl/hapi-auth-jwt2 But we should not limit our "inspiration" for "good docs" to @dwyl ... we should find the best docs for any Open Source project and match/beat it!

Functionality is essential, but docs and examples are the difference between "that's nice" and "this is amazing!

Don't get "hung up" on the details of how the more advanced features like Admin and Contact will work or look. focus on the "core" components at the "base" of the stack for now.

If it's "difficult" to have an idea, simply go look at

We are re-imagining the Web Application Stack from the ground-up to be distributed, fault-tolerant, real-time, offline-first, metrics-driven, GDPR-compliant and fun to use for developers and end-users. (if that sounds like a lot of "buzzwords", just focus on "Hits" as the example for now...) 😉

Danwhy commented 6 years ago

Thanks for this write up @nelsonic, it's really useful to see the full plan of what we're working towards. I've really been enjoying working on the various parts of this so far, and am looking forward to those we haven't got to yet.

The only question I have is around how all of these modules actually fit together. Based on the above, it looks like each element in the stack depends on those below it, but my understanding was that each of these modules would be completely independent of each other; so if somebody wanted to use only the analytics module, or only the auth module, they wouldn't be required to use the append only log if that wasn't what they needed.

You also mentioned repurposing abase, so would that be the "link" between all of these modules? (So in our client apps we'd essentially just require abase, which would give us the full functionality of the stack, in a framework that's essentially "Phoenix+", but the modules would also be available individually for other people that didn't want our whole stack)

I just want to be sure I know exactly how independent these modules are going to be, to make sure I'm implementing them correctly as we go on.

nelsonic commented 6 years ago

@Danwhy thanks for the insightful feedback and clarifying questions. 🥇

Great news that you've been enjoying diving into building the alog and autoform pieces. 🎉

Answers

Each "core" piece should be independently tested and documented. I consider alog and fields to be the most decoupled "core" pieces and if someone wants to use alog and/or fields independently they are welcome/encouraged to!

The further up the stack the more "coupled" the pieces become to the "core". But that does not mean we should build everything into a monolith called "starter-kit" that contains everything... because it would rapidly become "bloated"! Instead what we want to do is build each piece as a "lightweight" MVP of the desired functionality so that we can have people in the community contributing to different elements of the stack without necessarily needing to understand the whole thing. (although our mission is to make the Docs "so good a complete beginner can understand!)

We are not going to build auth to be "general purpose" for any Phoenix project; it will have a "hard" dependency on autoform which in turn will be dependent on fields. And since our chosen "data model" for all future projects is an "append-only log" (for accountability, analytics and distributed systems), what we are currently calling "alog" (see: https://github.com/dwyl/alog/issues/2 for naming discussion) will also be a "hard" (not optional) dependency.

Note: Arguably, Auth should be a completely separate "Umbrella App" that is deployed independently of any "Client App" logic and just returns a "Session Token" to represent the session for a given user when they are logged in. But deploying multiple apps is needless complexity at this point, and we can always refactor into an Umbrella App later on as the need arises. My only reasoning for doing it sooner would be to have complete separation between Application Data and Personally Identifiable Info such that a "compromise" of the Anonymised Application Data will not give the attacker access to people's PII ... 🤔

In our "Client" apps there will be no such thing as a locally defined "User Schema". The concept of a person will be pre-defined in auth and if the Client App needs to add a custom field to the person schema it will be done using a person_custom_fields "lookup table". e.g:

inserted_at attribute_id (PK) person_id (FK) field_type data
1541609554 e096d10048146852 e2ad11e8871e favourite_color rainbow
1541609876 ab4362a379819971 31e711e89bd8 favourite_food ice cream

This will make the framework incredibly flexible and essentially allows people to "drag-and-drop" from a list of "available fields" into a "custom fields" table, thus the "Product Owner" can construct their "App" based on reusable components (think lego bricks!) without having to write a line of code. "Custom Autoforms" will do the "heavy lifting" to render the forms, validate and store the data and admin will help the Product Owner view the data as it is submitted. Yes, this will effectively be a way to create a TypeForm/SurveyMonkey/WuFoo with just a few clicks! 😮

Note: in this example, we are displaying the data in plaintext but it would still be stored encrypted. Also, we will eventually have a field definition in fields for the most obscure fields, but a Fields.color should be an encrypted_INT of RGBA and the autoform for inputting the data should be a "Colour Picker" widget ... see how we can rapidly go down a "rabbit hole" of thinking about how to implement "all the things" ...? We need to be disciplined/focussed to only build the fields we need as we need them. 🙄

It's going to be much easier to explain the reasoning for each element in the "stack" once the need arises. But here's a basic example: ("abase-ic", see what we did there...? 😉)

Example

Imagine you are building a Social Network for Pet Lovers and you need to store specific data for each member of the platform including their pet's name, species, breed and date of birth.

The "base" person schema will be the "standard" one defined in auth:

inserted_at email_hash (PK) email_encrypted name_encrypted password_hash verified
1541609554 48146852e096d100 e2ad11e8871e b4f54f713284 4362a31e711e 1541609554
1541609876 379819971ab4362a 31e711e89bd8 9566834fc199 1ab4362a3456 null

All personal data is encrypted "pre-DB-insertion", so this illustrates data "at rest", also hashed/encrypted data will be considerably longer, I've truncated it here for brevity. If anyone else is following along scratching their head as Why or How this is done, see: https://github.com/dwyl/phoenix-ecto-encryption-example Yes, encrypting all personally-identifiable data is tedious and gives us the developer's "more work", but the whole point of fields is that the encryption/decryption happens automatically/transparently in the load (Ecto Custom Type callback) function and the data is plaintext within the App and UI. This is the responsible way of building Apps, and "in the future" all Apps will be built this way by default; security & privacy first!

The pets table would be:

inserted_at person_id pet_id name_encrypted species breed date_of_birth_encrypted
1541609554 48146852e096d100 8871ee2ad11e 13284b4f54f7 4362a31e711e 123 76834fc199
1541609876 379819971ab4362a 89bd831e711e 4fc199956683 1ab4362a3456 345 4362a3456

These fields might be rendered on the person's "profile page" as:

Ignore the other/advanced features of a "Social Network" for now and just focus on the fact that the person has this related content in the pets table. The person can have as many pets as they like; one-to-many. But we are not extending the person record, we are creating a separate table with the custom fields.

A naive way of implementing this "social network for petlovers" would be to add custom fields to the person ... e.g:

That would be a non-relational/denormalised way and would mean that the "outlier" on the network who literally owns a zoo cannot list all their pets ...

As you can tell from the data we are capturing for the pet, we can easily construct the pets table using fields:

Note: we could further "normalize" the pets data by creating an animals lookup_table which would store the species, breed, sub-type variant data ... see: https://en.wikipedia.org/wiki/Phylogenetic_nomenclature thus the reference in the pets table would be to a PK in the animals lookup_table. But, again for illustrative purposes we are keeping this "simple".

Repurposing "Abase"?

Yes, having a "Phoenix+" stack (name TBD) which would include "users" (people), emails_sent, auth, admin, analytics, etc. is the medium-term goal for this quest.

Being able to reference one package in the "Client App" mix.exs dependencies would be slick.

We need help on defining the name for this Application Framework "Scaffolding" ... #HelpWanted Sadly, as usual in computer science / software, most of the "good" names are "taken" ...

The only reason I like abase is because it describes what it is/does it's the "base" for building other apps, and alphabetically it would appear first (or near the top) of any "list" of frameworks. But I don't think we are "married" to the name at all! If anyone can think of a better name, please do!


tl;dr

Please note: this is not a "full plan", it's merely an initial diagram to help the team visualise/understand that there is a "vision" for where we are going and a "logical order" for building things. The "destination" is something akin to a "create-app" tool that includes everything for a person to get started with building their App. We are nowhere near "done" with alog for example, by the time it's at 1.0 it needs to be the de facto way of building distributed/offline-first apps in Phoenix and we need to show people Why using a "traditional" CRUD is a bad idea for apps that allow off-line record creation and want to have accountability built-in.

Imagine you're building a Todo List with Time Tracking and you want people to be able to add items to their list while they are on the London Underground (without network access), Having and auto-incrementing integer as the Primary Key for the "Task" Schema will mean that the person cannot create records offline and then sync them back up to the DB when they get back online. This means we're going to need to create something similar to PouchDB for use in our Progressive Web Apps so that people can store data on the device, search for history and create new records on their device. (It means we'll need to simulate the Ecto Query Syntax on the Client!) And then sync should be automatic/transparent once network access is re-gained. This is an obvious "gap" in the diagram (above) but it's 100% deliberate because if we were to map out the whole diagram it would look more like a "mind map" and would likely "overwhelm" people. One of the most important things in "application architecture" is to avoid over-complicating things up-front.

We are working towards a future of "flow-based programming" where we have a bunch of "lego bricks" that can be combined visually into building anything. It will allow complete beginners to drag-and-drop fields to create questionnaires to capture data in minutes.

By the time we are "done" we will be able to visualise data flowing through the system in real-time. Product owners will be able to create A/B & multivariate tests and track the data/user-feedback simply by re-arranging the fields in a "custom" autoform layout. Running 50:50 or 10:90 traffic through the "experiment" version will be 3 clicks.

My personal goal is to use this stack for capturing "farming data" in https://github.com/dwyl/home and writing ML algorithms to "understand" how plants grow and how to automate nutrient dispensing for optimal growth/health.

We just have to resist the temptation to over-think things too early and avoid spending too much time dreaming about "the future" ... we must focus on building small Example/Demo apps with the stack and evolving the stack as needed. More on that v. soon! 🔜

Real App Examples for Learning the Stack?

My personal learning objective (see: https://github.com/dwyl/hq/issues/497) in the next 6 months is to create one fully functional end-to-end demo/example app using the "New Stack" each month.

Culminating in a "curriculum" of "7 Apps in 7 Weeks": (from my personal notebook/journal...) seven-apps-in-seven-weeks

that anyone can use to learn how to build using the "New Stack" from anywhere! Then we "just" need to create the learning content for:

And You can begin to see where this is going ... 😉 We will be each be building one or more "example apps" as complete learning resources in the coming weeks/months. And in the course of building these examples, we will evolve the "stack". Then, armed with all the examples we will be able to build our own App(s) and any "Client Apps" ... see: https://github.com/dwyl/start-here#what 😮

carl-sagan-nature-quotes-if-you-wish-to-make-an-apple-pie-from

nelsonic commented 1 year ago

The next element in the stack is Logging. 🪵 So that's what I'm working on now. 🧑‍💻 That way I can re-use it in all the other projects ASAP. ⏳