aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.43k stars 818 forks source link

Custom transforms - adding a sample for a simple custom visual transform #138

Closed rmattsampson closed 1 year ago

rmattsampson commented 1 year ago

Issue #, if available:

Description of changes: Adding a basic sample custom visual transform for Glue Studio. Particular sample fills empty strings in a user defined column with a user defined value Custom Visual Transforms general guidelines and creation docs are available here: https://docs.aws.amazon.com/glue/latest/ug/custom-visual-transform-create-gs.html

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

rmattsampson commented 1 year ago

Received some feedback offline, will apply that to this PR and get a new iteration

rmattsampson commented 1 year ago

I have re-tested this code (using the CVT process to add a CVT, and have it tested on some sample data set)

I have simplified the code quite a bit

The CVT now handles "nulls" or "empty strings" the same. There is a dependency on PANDAS now (for the null case)

rmattsampson commented 1 year ago

I have REMOVED the PANDAS Dependency on fillna, and instead use the PySpark FillNa method which works just as well tested using Jupyter Notebooks and of course CVT directly in Glue Studio

moomindani commented 1 year ago

Thank you for your contribution! Merged this PR.