databricks / spark-xml

XML data source for Spark SQL and DataFrames
Apache License 2.0
500 stars 226 forks source link

Search recursively with xml #634

Closed DanialP closed 1 year ago

DanialP commented 1 year ago

I am not 100% sure if this issue belongs here, if not please close it.

When reading other file formats (parquet, csv, ...) there is the possibility to recursively search subdirectories via the options "recursiveFileLookup" and "pathGlobFilter". Unfortunately, this does not seem to be the case with xml.

I would like to make a call that looks like this:

df = (
    spark.read
    .format("xml")
    .option("wholetext", "True")
    .option("rowTag", "Library")
    .option("recursiveFileLookup","true")
    .option("pathGlobFilter", "*.xml")
    .load("path/to/file")
)
srowen commented 1 year ago

I don't think you can, if it doesn't work. I think that's related to DSv2, and this doesn't support it, which is hard. See other issues about that.

DanialP commented 1 year ago

Thank you for the response. In that case I guess there wont be a solution to that any time soon.