PhilanthropyDataCommons / service

A project for collecting and serving public information associated with grant applications
GNU Affero General Public License v3.0
8 stars 2 forks source link

Update bulk upload processing to create organizations and link them to proposals #902

Closed slifty closed 5 months ago

slifty commented 5 months ago

We have the concept of an organization, but don't currently create organizations for proposals when we invoke the bulk uploader.

slifty commented 5 months ago

To implement this I will need to know which base field to use to represent (1) org name, and (2) org EIN. Those values will then will drive the lookup (or creation) of organizations when adding each proposal. Note that these won't be REQUIRED fields, since a proposal does not REQUIRE an organization.

We had something like this before back when there was the concept of an applicant, but I want to spend a moment revisiting the question rather than just duplicating that implementation.

Some options:

  1. Use hard coded short codes -- it's on the PDC instance to ensure short codes with those values exist in order to take advantage of organization support int he bulk upload flow.
  2. Add a new isOrgnanizationEmployerIdentificationNumber and isOrganizationName attributes to base_fields -- this would make the functionality of the base field explicit, and would also allow any number of base fields to serve this purpose.
  3. Configurable short codes for this purpose (either db driven or env var driven).

Some other considerations:


scratch all that I have a better idea.

slifty commented 5 months ago

I'd like to propose the following:

  1. We add the field_scope attribute to base_fields which lets us indicate that a field relates to an organization (this is already something we want to plan for)
  2. We create two new field types: name and ein (specific name TBD)

Our processing script would then check if any fields are org fields of type name / ein respectively and use those.

(And then we'd just use the first one if there was more than one in a bulk upload)

jmergy commented 5 months ago

@slifty @reefdog One thing to note as well - might not be directly relevant here, but something I wanted to raise-up too. Schema.org has come up from time to time on discussions. It might be helpful to embrace some connections there, where relevant. There is a lot of data out there that folks are connecting to this meta data to bridge data "standards" and perhaps something here could be instructive for the PDC.

For example:

Organizations https://schema.org/Organization Grant https://schema.org/Grant Funding Scheme https://schema.org/FundingScheme

I could see this as a development ask later down the road to overlay base field aliases or something to be schemadotorg compatible at some point. The need/want to have base field aliases also came up in the pilot from a vendor - so they could provide an initial mapping - at least for organization and other kind of known / common fields.

Doesn't need to be now, but just thinking we will want this later and it could be instructive on questions / decisions now.

slifty commented 5 months ago

@jmergy oh that's a great point! I think that concept would be most directly relevant right now for the definition of our base fields (ideally each attribute in the organizations schema, for instance, would appear as a base field!)

On a different note I wanted to mention that my plan for this issue is NOT to update organizations based on bulk upload -- for instance, if an organization name associated with an EIN in a bulk upload row is different from the one in the database, I would not plan to update the organizations table (though the field value WOULD be stored!).

slifty commented 5 months ago

I opened up #922 to defer the "is there a more correct (but more complicated) way to indicate which base fields should be used for EIN / org name"

In the spirit of velocity, the initial implementation will just have hard coded magical short codes.