Open slifty opened 2 months ago
Thinking through types of source has helped shed a bit of light on the potential implementation details, so I'm going to carry that thought process here.
First, some concrete example sources / personas (not worrying about types yet):
It seems clear to me that we want the following sources to be specific:
What is less clear to me is various types of direct entry -- when a user is manually editing or uploading data do they become the source? Should they get a row as an individual, or should there be one generic "direct entry" source?
(I have to pause, but will return)
I think I have a more concise list of scenarios which will help us nail down the use case (which, once we understand, will allow us to hone in on the implementation / design)
Here are the scenarios, and I put ???
in places where I'm going to want to get clarity from @kfogel / @jim-mcgowan / @jmergy
I think that a quick voice chat might be helpful if any of the above are available for a quick call!
Note: @slifty and I are on a call about this right now, as per above.
We just had a great conversation about this and here's where things landed (hopefully future dan will completely understand what I'm typing right now):
MacArthur Fluxx
and MacArthur Agent
respectively.Organization
and Source
just to think about whether there is any normalization redundancy there. Could go either way.(stay tuned)
Regarding the question of normalization, I am leaning towards "source" being a polymorphic mapping entity. It would have a "source_type" and "source_id" field -- depending on the type, the source would point to either a funder
, organization
or data_provider
entity.
Pros:
Cons:
source
entity is somewhat more complex (depending on the source type it would either have an organization
funder
or dataProvider
attribute.)The alternative would be to have a sources
entity with no foreign key relationship; just a type, a name, etc.
Ultimately I think the tradeoff / having access to richer data related to a given source is worth it.
Talked to @jasonaowen about this and we landed on......
NOT that!
Basically, the big downside of the polymorphic approach is that data integrity can fall apart over time (since it isn't a DB-enforced foreign key relationship it becomes possible for records to get deleted without cascading the deletion).
So, we're gonna just have a source table that has one column for potential source entity, with a table rule that only one column can be non-null.
Jason pointed out that we don't really need the type
any longer at a DB level at that point since you can extrapolate the type based on which field is not null.
There may still be TypeScript benefit in having the type value ultimately map to a discriminating union implementation here (which would make it clear that only one value can be populated at the TS level).
Almost almost almost done thinking about this (measure twice merge once amirite)
Right now we have Organization
as an entity in our system. The only kind of organization that exists today is one that submits an application.
There are other types of organization (from a literal, real world sense) we're imagining though:
Before I go and create those new entity types I wanted to take one step back and reflect on whether or not these are actually distinct entity from Organization
.
For now I think that Organizations
is created with a very specific use case in mind (being an entity that is stored in the PDC and decorated by PDC data). We may some day want to associate a funder with an organization profile, but that can be done via a relationship.
Bottom line, three entities representing three distinct functions in the system is appropriate.
Over in #1083 we spent some time talking about provenance for this system.
Ultimately we are going to want to track "who uploaded this data" and "where did it come from"
There were some decisions left to the implementer around specific naming for these fields, as well as whether we want to create a
provenance
entity which combines them (as opposed to simply having two separate fields on the appropriate data table).