Open sharkySharks opened 9 months ago
@sharkySharks I like the idea and glad we are doing this.
The file you're changing here is for documentation only. The correct steps to actually do what you're trying to do are below:
Product
schema in QA. 2.1.0
is correct - this is a breaking change that will result in many more events ending up in the Bad Bucket.Let me know if any of this is unclear!
@aniham thanks for the info on making the update. I just checked all the webpack configs and all urls are coming back as 404. Do i need to be allow listed before I can access?
@sharkySharks yes for QA you have to be on VPN, sorry forgot to mention earlier
For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:
Update on testing, I was able to test this out in the qa environment locally, sent productPageView and addToCart events from localhost, saw them end up in the good bucket in qa kibana. Nothing in the bad bucket from those events. Not sure if other events need to be tested as well.
I think we should also consider minLength for the required name field as well.
Also, I do not have production promotion permissions in the snowplow console as far as I can tell. But latest code here is what I tested against.
@sharkySharks this sounds right. I think you tested the right things though you should probably do a search through the repo to make sure no other events are using the product context (or if they are, to test those as well).
Next steps will be:
After that is done, you should be able to publish your change here to prod!
@sharkySharks note that once this change goes into prod, we're going to see a uptick of events going to the bad bucket - that is the correct behavior and what we want ofc - but worth keeping an eye out and doing some reporting on what changed.
For context on the version number, snowplow did not allow for the minor version to be set to 2-1-0, this is the message that I received and decided to go with the patch bump version:
I think it should be 3-0-0 since it's a breaking change. That way, we can roll it back easily if something goes wrong
@krithikachandran I am good with using the major version upgrade, particularly if we think it is going to be a breaking change even if it is a bug fix. I will redeploy it out with that version and test.
@aniham I don't have permissions to push the schema to prod when it is ready. I don't see that button, if you could check my permissions I can do that after I test the major version upgrade.
@sharkySharks you're now admin
in Snowplow - should be able to merge!
Ran both positive and negative tests against the new schema 3-0-0.
Here is what happens in the negative case:
migrated the schema to production in snowplow.
If all looks good to you here I'll wait for a final sign off on the latest schema version update on this pr and then go through QA testing tomorrow for the app.
Hey folks, I've updated this pr, I have not tested this yet as I want to get feedback on the potential changes first and then move forward with making the changes in snowplow for qa and run through all the testing paths once there is agreement on these changes.
I went ahead and added minLength
to required fields for string
and array
fields that did not also include null
in their accepted types. I have not done the work to verify what fields should still be required or not, I just blanketed all required string and array fields with minLength
for schemas in this repo.
This is coming out of numerous outages this year where required fields were empty and broke various parts of ML pipeline processing.
Description
While we assert in the product schema that the
sku
field is not null, it should also not be an empty string. Recently ran into some customer issues where an empty sku field left events unprocessed. This schema needs to be updated in the snowplow console, but I will wait for feedback here before making any schema changes. Also interested in the process for updating this schema within snowplow.Related Issue
Please see internal adobe jira issue DSCS-863.
Motivation and Context
sku
is a required field, which means it should also not be empty.How Has This Been Tested?
Downstream processing pipelines for recommendations were breaking because the sku field was empty for some events for some customers. In those pipelines queries that assumed the sku would not be empty on product events have been updated to explicitly remove empty strings as well as the null values so that the pipelines do not error with missing sku information. Twofold, schema validation should also handle empty product event sku information so that events that do not meet our expectations fail fast and return accurate assumptions to our customers about which product information may be missing required sku data.
Screenshots (if appropriate):
Types of changes
Checklist