Open universalmind303 opened 1 month ago
on further experimentation, it looks like this works,
df.select(col('parsed_urls.*'))
but i think having it also work on .struct.get('*')
would be more intuitive or even a .struct.explode()
would also be nice
The main reason one of these would be nice is if you have a function that returns a struct, you need to do multiple .select
statements to flatten it.
for example:
@daft.udf(return_dtype=daft.DataType.struct({
'scheme': daft.DataType.string(),
'host': daft.DataType.string(),
'path': daft.DataType.string(),
}))
def parse_url(url: daft.Series):
parsed_urls = []
for u in url.to_pylist():
parsed = urlparse(u)
parsed_urls.append({
'scheme': parsed.scheme,
'host': parsed.netloc,
'path': parsed.path,
})
return daft.Series.from_pylist(parsed_urls)
df.select(parse_url(col('urls'))).select(col('urls.*'))
when instead it'd be nice to just chain it and do everything in a single .select
df.select(parse_url(col('urls')).struct.get('*'))
df.select(parse_url(col('urls')).struct.unnest())
Is your feature request related to a problem? Please describe. I want to flatten all columns in a struct into the top level. But it seems like I need to manually select all keys to do that.
Describe the solution you'd like
I first tried to do this
but wildcarding does not appear to be supported there.
I also tried
.explode
but that seems to only work on list/fsl