adobe / commerce-events

MIT License
6 stars 7 forks source link

DSCS-863: add minLength to all required schema fields, where applicable #138

Open sharkySharks opened 9 months ago

sharkySharks commented 9 months ago

Description

While we assert in the product schema that the sku field is not null, it should also not be an empty string. Recently ran into some customer issues where an empty sku field left events unprocessed. This schema needs to be updated in the snowplow console, but I will wait for feedback here before making any schema changes. Also interested in the process for updating this schema within snowplow.

Related Issue

Please see internal adobe jira issue DSCS-863.

Motivation and Context

sku is a required field, which means it should also not be empty.

How Has This Been Tested?

Downstream processing pipelines for recommendations were breaking because the sku field was empty for some events for some customers. In those pipelines queries that assumed the sku would not be empty on product events have been updated to explicitly remove empty strings as well as the null values so that the pipelines do not error with missing sku information. Twofold, schema validation should also handle empty product event sku information so that events that do not meet our expectations fail fast and return accurate assumptions to our customers about which product information may be missing required sku data.

Screenshots (if appropriate):

Types of changes

Checklist

aniham commented 9 months ago

@sharkySharks I like the idea and glad we are doing this.

The file you're changing here is for documentation only. The correct steps to actually do what you're trying to do are below:

  1. Go into the Snowplow console and create a new version of the Product schema in QA. 2.1.0 is correct - this is a breaking change that will result in many more events ending up in the Bad Bucket.
  2. Test locally against Snowplow QA by changing the schema URL to point to new version: https://github.com/adobe/commerce-events/blob/main/packages/storefront-events-collector/src/schemas.ts#L9
  3. Publish Snowplow changes to Prod
  4. Submit PR here that changes docs AND schema file above ^^ to point to new URL

Let me know if any of this is unclear!

sharkySharks commented 9 months ago

@aniham thanks for the info on making the update. I just checked all the webpack configs and all urls are coming back as 404. Do i need to be allow listed before I can access?

aniham commented 9 months ago

@sharkySharks yes for QA you have to be on VPN, sorry forgot to mention earlier

sharkySharks commented 9 months ago

For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:

Screenshot 2024-02-02 at 11 17 15 AM
sharkySharks commented 9 months ago

Update on testing, I was able to test this out in the qa environment locally, sent productPageView and addToCart events from localhost, saw them end up in the good bucket in qa kibana. Nothing in the bad bucket from those events. Not sure if other events need to be tested as well.

I think we should also consider minLength for the required name field as well.

Also, I do not have production promotion permissions in the snowplow console as far as I can tell. But latest code here is what I tested against.

aniham commented 9 months ago

@sharkySharks this sounds right. I think you tested the right things though you should probably do a search through the repo to make sure no other events are using the product context (or if they are, to test those as well).

Next steps will be:

After that is done, you should be able to publish your change here to prod!

aniham commented 9 months ago

@sharkySharks note that once this change goes into prod, we're going to see a uptick of events going to the bad bucket - that is the correct behavior and what we want ofc - but worth keeping an eye out and doing some reporting on what changed.

krithikachandran commented 9 months ago

For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:

Screenshot 2024-02-02 at 11 17 15 AM

I think it should be 3-0-0 since it's a breaking change. That way, we can roll it back easily if something goes wrong

sharkySharks commented 9 months ago

@krithikachandran I am good with using the major version upgrade, particularly if we think it is going to be a breaking change even if it is a bug fix. I will redeploy it out with that version and test.

@aniham I don't have permissions to push the schema to prod when it is ready. I don't see that button, if you could check my permissions I can do that after I test the major version upgrade.

aniham commented 9 months ago

@sharkySharks you're now admin in Snowplow - should be able to merge!

sharkySharks commented 9 months ago

Ran both positive and negative tests against the new schema 3-0-0.

Here is what happens in the negative case:

Screenshot 2024-02-06 at 6 03 27 PM

migrated the schema to production in snowplow.

Screenshot 2024-02-06 at 6 06 01 PM

If all looks good to you here I'll wait for a final sign off on the latest schema version update on this pr and then go through QA testing tomorrow for the app.

sharkySharks commented 9 months ago

Hey folks, I've updated this pr, I have not tested this yet as I want to get feedback on the potential changes first and then move forward with making the changes in snowplow for qa and run through all the testing paths once there is agreement on these changes.

I went ahead and added minLength to required fields for string and array fields that did not also include null in their accepted types. I have not done the work to verify what fields should still be required or not, I just blanketed all required string and array fields with minLength for schemas in this repo.

This is coming out of numerous outages this year where required fields were empty and broke various parts of ML pipeline processing.