databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
500 stars 226 forks source link

Initial pass at supports 'paths' data source option with multiple file paths #629

Closed srowen closed 1 year ago

srowen commented 1 year ago

(Pull request for testing - TBD)

zsxwing commented 1 year ago

Is it possible to write a test using spark.read.format("xml").load(paths: _*) to verify the change?

srowen commented 1 year ago

@zsxwing heh yeah it doesn't work. Looking more closely, it seems like "paths" is a DSv2 thing, and only "path" is set for legacy V1 sources (like this). I can try a hack but really this seems tied up with V2 support I guess, and I don't know how to do that

zsxwing commented 1 year ago

After checking the code, I also realized that load(paths) in DSv1 only works for FileFormat which is the internal interface for built-in file formats 😢

Thanks for quickly working on this! I think we have to give up this as moving DSv2 is unrealistic.

srowen commented 1 year ago

OK canning this for now. Thanks for checking!