PhilanthropyDataCommons / service

A project for collecting and serving public information associated with grant applications
GNU Affero General Public License v3.0
8 stars 2 forks source link

Implement Data Validation #867

Closed hminsky2002 closed 7 months ago

hminsky2002 commented 7 months ago

This PR introduces basic data validation for uploaded proposal fields. It modifies the base_fields and proposal_field_values tables in the database and related types in the service layer to accommodate

  1. A set of predefined acceptable types for a field
  2. A boolean field "is_valid" to indicate if the proposal field's value's type matches the related base field

Validation is performed by a helper function, fieldValueIsValid, which uses regexes, though a more developed validation system will likely be necessary as we account for more complicated types.

Testing:

  1. Start pdc service
  2. Run migrations. a. NOTE: All base field types will be set to string by default, which are valid by default in the validation system. In order to test individual types(such as number, phone number, email, etc), new base fields with the corresponding type will need to be created and inserted into the database.
  3. Start pdc frontend, navigate to the add data page
  4. Upload a proposal with chosen fields valid or invalid
  5. Verify that the expected validity is reflected in the "is_valid" column in the proposal_field_values table in the database

Resolves #760

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.50%. Comparing base (a001236) to head (8047724).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #867 +/- ## ========================================== + Coverage 87.23% 87.50% +0.26% ========================================== Files 103 104 +1 Lines 1394 1424 +30 Branches 167 175 +8 ========================================== + Hits 1216 1246 +30 Misses 163 163 Partials 15 15 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

slifty commented 7 months ago

I'll give a review of the actual code soon but in the shorter term some meta changes requested:

  1. Each commit should be self-contained and result in a "correct" state -- so the refactored tests should be added as the changes are made (they are part of the change, and actually serve as contextual documentation of the change because they communicate what you changed and how those changes change expectations of API behavior!)

  2. It looks like there isn't much test coverage for the validation functions; could you add more so that there is 100% coverage?

hminsky2002 commented 7 months ago

Thanks for bearing with all the iteration!

This looks really great, well done!

Woooo!

bickelj commented 6 months ago

@hminsky2002 @slifty I think the same principles for database schema migrations apply to database seed scripts. In other words, I don't think we should modify existing seed scripts but rather append new scripts that assume the previous ones ran. For example, in production we have run several more seed scripts that did not make it into the main branch for one reason or another, see #305. It looks like the ball might be in my court to respond to Jason's objections, though. Of course, another option is to start over with base fields seeds, and that might be preferred. In that case, we should delete all the data in the production database and create a new seed script and run it in prod.

slifty commented 6 months ago

@bickelj I am comfortable with that approach but also wanted to note... I think we should ditch the seed scripts entirely, and instead lean into #756