PhilanthropyDataCommons / service

A project for collecting and serving public information associated with grant applications
GNU Affero General Public License v3.0
8 stars 2 forks source link

Think through 3rd party data provider integration implementation #1088

Open slifty opened 1 month ago

slifty commented 1 month ago

Our current implementation of 3rd party data provider scraping was very intentionally demo-implemented and did not really sit within the PDC architecture in a way that would last.

It's come time for us to actually implement this feature in a way that interacts with organizations and base fields!

jmergy commented 1 month ago

I would think this could also be a type of proposal associated to the organization. The data from other sources would have a structure not unlike a grants system. I could see classification of proposals as a way to denote intended use of the proposal data contained in the grouping of fields from the specific source. We could have type=funding for the types of proposals we have in there now. But we could have type=organizational if the data was covering organization attributes or something from a data source. 990 data for example could come in as proposals but have a type of organizational

slifty commented 1 month ago

Interesting idea!

I am a bit wary of using proposals for this since there are a lot of things related to proposals that are unrelated to this (candid data is organizational -- a candid snapshot is not a proposal / there's no such thing as an application form / proposal outcome / funder / opportunity etc.). We could use that existing infrastructure but to me that would be an artificial coupling that would risk limiting our flexibility in the long run. Even in the immediate term, for instance, we would have to add a bunch of unintuitive logic to conditionally ignore certain kinds of proposal in API responses since we wouldn't want to render org-data imports alongside actual proposals.

All this is to say, I think that unfortunately this does warrant a set of more specific entities -- though I think it can still be done elegantly!

FWIW the initial ERD did have a very very high level / placeholder model for all this (external sources / imports) -- it's just that our initial implementation of this feature was for demo purposes so we didn't lean on / flesh out any of those designs:

erd

The basic idea is that we would have "external field values" (that name is no longer quite right IMHO) in addition to "proposal field values" which would directly relate to organizations. I think something like this model can still work, though we'll want to create a higher level entity that represents a given snapshot of organizational data (e.g. a set of related external data).

I should note that these snapshots would explicitly NOT contain proposal data -- only organizational fields.