Closed MrGossett closed 2 years ago
I ran through a brute force search, updating my example script above with each of the transforms listed in the PySpark Transforms section of the Glue docs. Here are my results:
Transform | Result |
---|---|
ApplyMapping |
supported ✅ |
DropFields |
supported ✅ |
DropNullFields |
supported ✅ |
ErrorsAsDynamicFrame |
unsupported ❌ |
Filter |
unsupported ❌ |
FlatMap |
unsupported ❌ |
Join |
supported ✅ |
Map |
unsupported ❌ |
MapToCollection |
unsupported ❌ |
Relationalize |
supported ✅ |
RenameField |
supported ✅ |
ResolveChoice |
supported ✅ |
SelectFields |
supported ✅ |
SelectFromCollection |
unsupported ❌ |
Spigot |
supported ✅ |
SplitFields |
supported ✅ |
SplitRows |
supported ✅ |
Unbox |
supported ✅ |
UnnestFrame |
unsupported ❌ |
@MrGossett, I've filed an internal ticket to pass this request on to the Glue doc writing team. They own the content that ends up in this particular CLI description. I'll ask them to flesh out the meaning of the NodeType element. Thanks for the feedback!
(V156194273)
@bisdavid any idea if an update to the Glue docs is planned?
Hi @MrGossett, I confirmed that the Glue team is aware of the issue, but no ETA as to when it will be changed.
I ran into the same.. this works for me
{
"DagNodes": [
{
"Id": "DataSource0",
"NodeType": "DataSource",
"Args": [
{ "Name": "database", "Value": "mydatabase_source" },
{ "Name": "table_name", "Value": "mytable_source" },
{ "Name": "transformation_ctx", "Value": "DataSource0" }
]
},
{
"Id": "Transform1",
"NodeType": "CustomCode",
"Args": [
{ "Name": "code", "Value":"pass" },
{"Name": "className", "Value":"MyTransform"},
{"Name": "dynamicFrameConstruction", "Value": "DynamicFrameCollection{\"DataSource0\":DataSource0}" },
{"Name": "classification", "Value":"Transform"},
{"Name": "dfc", "Value":"Transform1"},
{"Name": "transformation_ctx", "Value":"Transform1"}
]
},
{
"Id": "Transform0",
"NodeType": "SelectFromCollection",
"Args": [
{ "Name": "key", "Value": "list(Transform1.keys())[0]" },
{ "Name": "transformation_ctx", "Value": "Transform0" }
]
},
{
"Id": "DataSink0",
"NodeType": "DataSink",
"Args": [
{ "Name": "database", "Value": "mydatabase_sink" },
{ "Name": "table_name", "Value": "mytable_sink" },
{ "Name": "transformation_ctx", "Value": "DataSink0" }
]
}
],
"DagEdges": [
{ "Source": "DataSource0", "Target": "Transform1" },
{ "Source": "Transform1", "Target": "Transform0" },
{ "Source": "Transform0", "Target": "DataSink0" }
],
"Language": "PYTHON"
}
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.
aws glue create-script
will generate a script to use for a Glue Job given a description of DAG nodes and the edges between them. Nodes can be sources, sinks, or transforms.API Docs for the
CodeGenNode
structure show that itsNodeType
attribute is required, and that it's a UTF-8 string. The description says "The type of node that this is."aws glue create-script help
reinforces this:However, I can't find anywhere in the docs or in CLI help the list of supported values for
NodeType
.Here is a JSON file describing the input to
aws glue create-script
:Generating a script using that JSON input is successful:
However, if I change the transformation from
ResolveChoice
toMap
, I get an error.Here is the updated
input.json
:Notice the only thing that has changed is the definition of the
transform
node.The
create-script
action now returns an error:Apparently
Map
is not supported, butResolveChoice
is supported.It would be very helpful if there was documentation somewhere listing which transforms are supported by the
aws glue create-script
action.