PhilanthropyDataCommons / service

A project for collecting and serving public information associated with grant applications
GNU Affero General Public License v3.0
8 stars 2 forks source link

Internationalize base fields #1123

Closed hminsky2002 closed 3 weeks ago

hminsky2002 commented 1 month ago

This PR contains the refactor of the base field type to accommodate the new base field localization, which will be used to store the translations of the pdc's base fields. Rather than having one static label for the base field, base fields now support multiple separate translations for their labels, and optionally descriptions, represented as a many-to-one relationship between the base field and base field localizations table. This change required a good amount of refactoring across the codebase, but thankfully the work @slifty has done to document our existing concepts has paid off ten fold.

Initial basefield localization creation is handled on creation of a basefield, and subsequent localizations are created by the nested 'POST baseField/{baseFieldId}/localization' route. Updating a basefield localization similarly is handled by a nested 'PUT baseField/{baseFieldId}/{language}` route, as localizations use a natural unique id on base_field_id and language.

Base field languages are validated as valid IETF language tags via the language-tags package. Invalid language tags are rejected on creation

Because base fields have been refactored, the seed file has been removed from the repository, as it was no longer functional.

Since labels are no longer individual fields on our basefields, our intake for application forms has been modified slightly. Instead of setting application form field values to the basefield label in our 'processBulkUpload.ts' task, we instead check if an existing English label exists on the referenced base field. If it does, we use that as our application form field label, otherwise we default to the basefield's shortcode

Closes #1110

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 96.55172% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.82%. Comparing base (46f26cd) to head (5d04dc2). Report is 16 commits behind head on main.

Files Patch % Lines
...teOrUpdate/createOrUpdateBaseFieldLocalizations.ts 90.00% 1 Missing :warning:
src/handlers/baseFieldsHandlers.ts 95.65% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1123 +/- ## ========================================== + Coverage 88.57% 88.82% +0.24% ========================================== Files 129 134 +5 Lines 1725 1780 +55 Branches 213 227 +14 ========================================== + Hits 1528 1581 +53 + Misses 197 184 -13 - Partials 0 15 +15 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hminsky2002 commented 1 month ago

@jasonaowen Requesting your review on my commits here in prep for our conversation tomorrow! This is not a completed PR by any means, but I do think it's pretty close to finality. The last major thing I think I need to do is implement type checking for the localizations. A few things that are probably key to look at:

  1. How I'm handling updating base fields -> I'm allowing for creation of localizations via the post basefields route, but the put route for updating basefields only handles the shortcode/datatype fields, not the localizations. That's handled via the put basefieldslocalizations route. Totally open to changing this, just seemed like it made the most sense to me, since you wouldn't necesarrily want to touch the basefield fields when changing localizations.

  2. Dropping label from application form field -> I did this because it seemed like it was informed solely by the basefield label field, which is now defunct. As we're gravitating away from having the basefield store a canonical label, rather then the translations, i thought that it would be ok if applicationformfield just held on to it's foreign key relationship to the basefield.

  3. The specific logic behind basefield creation as it pertains to typescript -> I didn't really know the best way to approach this because I wanted to be able to create localizations on base field creation, but I needed to work around base_field_id being a required field for localizations. You'll see in the BaseField.ts and BaseFieldLocalization.ts I have multiple writable fields for each type, which I use seperately for creating basefields vs returning them. Now when you get a basefield returned via the get basefields route, you get the deeply populated localizations list, but when you update the basefield you just get back the fields you updated. It's a little mind racking and I am very open to a deep refactor there, would love your input on all of it!

hminsky2002 commented 1 month ago

@slifty @jasonaowen Ok putting a pin in things for today -- I believe all addressed issues have been resolved EXCEPT for the issue of refactoring the baseFieldLocalization routes AND language testing. ApplicationFormFields have been re-granted their label fields by a generous me, and the base_field_to_json function has been changed while the selectAll query has been reverted, and the tests have been refactored to accommodate this

@slifty if you wouldn't mind giving an extra once-over to the openapi spec changes I made that would be very helpful! I think everything's how it should be but I was kind of just following the form that other data had already done so an extra look over would be great.

ALSO -- After these migrations, the seed file is no longer functional. I was a little confused as to how to refactor that because if we're going to insert base_fields and localizations, I don't know how we match created base_field_ids with localizations. I suppose we could alternate insertions (base field, localization, base field, localization). Regardless, it will be a bunch of line changes so i wanted to wait for your inputs before going head first into that.

jasonaowen commented 1 month ago

@slifty @jasonaowen Ok putting a pin in things for today -- I believe all addressed issues have been resolved EXCEPT for the issue of refactoring the baseFieldLocalization routes AND language testing. ApplicationFormFields have been re-granted their label fields by a generous me, and the base_field_to_json function has been changed while the selectAll query has been reverted, and the tests have been refactored to accommodate this

:joy: Thanks, @hminsky2002! Sorry for arguing with @slifty in your PR; I know that can be frustrating. Maybe the three of us should chat next week?

ALSO -- After these migrations, the seed file is no longer functional. I was a little confused as to how to refactor that because if we're going to insert base_fields and localizations, I don't know how we match created base_field_ids with localizations. I suppose we could alternate insertions (base field, localization, base field, localization). Regardless, it will be a bunch of line changes so i wanted to wait for your inputs before going head first into that.

The solution I would like is to add a command to the [data-scripts]() to import base fields from another instance, as described in https://github.com/PhilanthropyDataCommons/service/issues/756. We've been talking about it for ages, and especially every time we have to touch the seed data, and I think maybe now is the time to bite the bullet and implement it. With that approach, we could simply delete the seed data, and say that it lives on in the production API instance.

hminsky2002 commented 1 month ago

@slifty @jasonaowen Ok putting a pin in things for today -- I believe all addressed issues have been resolved EXCEPT for the issue of refactoring the baseFieldLocalization routes AND language testing. ApplicationFormFields have been re-granted their label fields by a generous me, and the base_field_to_json function has been changed while the selectAll query has been reverted, and the tests have been refactored to accommodate this

😂 Thanks, @hminsky2002! Sorry for arguing with @slifty in your PR; I know that can be frustrating. Maybe the three of us should chat next week?

ALSO -- After these migrations, the seed file is no longer functional. I was a little confused as to how to refactor that because if we're going to insert base_fields and localizations, I don't know how we match created base_field_ids with localizations. I suppose we could alternate insertions (base field, localization, base field, localization). Regardless, it will be a bunch of line changes so i wanted to wait for your inputs before going head first into that.

The solution I would like is to add a command to the data-scripts to import base fields from another instance, as described in #756. We've been talking about it for ages, and especially every time we have to touch the seed data, and I think maybe now is the time to bite the bullet and implement it. With that approach, we could simply delete the seed data, and say that it lives on in the production API instance.

Noooo sweat. a productive discussion next week sounds splendid.

Importing from another instance sounds very cool. I'd imagine that would be a separate PR, so then in the meantime in this PR should we remove the seed file from the repository (since it will only serve to throw errors)?

slifty commented 1 month ago

+1 to finally implementing sync, but I do think that until we implement #756 we shouldn't break seed data... And also think we also shouldn't blow up the scope of this PR to do that implementation :-/

hminsky2002 commented 1 month ago

@jasonaowen @slifty leaving this off for today with language tag checking handled -- used the keyword approach suggested by @slifty. Excited to discuss this issue more tomorrow

For mine own sake, i'll write out the two things still unhandeled in this PR:

  1. A potential refactor of the baseFieldLocalization Routes and other routes that follow this pattern
  2. Refactoring the seed file, as it is currently broken with these changes.
hminsky2002 commented 1 month ago

@slifty @jasonaowen Ok I believe this is finally ready for final review! I have nested the baseFieldLocalization routes inside of the baseField routes, following the convention we proposed. I did end up creating ANOTHER baseField type and validator to accomodate the PUT route (one that only expects a label and optionally a description, as baseFieldId and Language are provided via the URL parameters) but I wonder if there's a slicker way to do this. See types/BaseFieldLocalization.ts. Furthermore, the I couldn't figure out the OpenAPI syntax for ensuring that ONLY label and description are provided as example data for the PUT route without changing the schema reference. I am sure there's a better solution there but could not find it on my own :(

Those two issues aside however, I think this is ready to go! Would love your final inputs on this, excited to globalize this bad boy

hminsky2002 commented 3 weeks ago

@slifty @jasonaowen ba-bam, this badboi is ready for final review. All tests passing with coverage I believe. Would love to see your thoughts on this and get it merged by tomorrow!

hminsky2002 commented 3 weeks ago

I tried so hard to find something else to request, I'm going to miss our time together working on this PR 😢

Just think, now we can support PDC PR discussions in every known language to man. And not just can, will! From now on all my commit messages will be doubled in Esperanto

Nur pensu, nun ni povas subteni PDC-PR-diskutojn en ĉiu konata lingvo al homo. Kaj ne nur povas, volas! Ekde nun ĉiuj miaj commit-mesaĝoj estos duobligitaj en Esperanto