Generated files does not have .xml extension #664

dolfinus commented 10 months ago


I've created simple dataframe:

Then saved it as xml:

df.repartition(1).write \
  .format("xml") \
  .mode("overwrite") \
  .option("compression", None) \
  .option("rowTag", "item") \

This is content of 2.xml folder:

> ls -la 2.xml
drwxr-xr-x  2 maxim maxim   84 окт  9 09:18 ./
drwxr-xr-x 19 maxim maxim 4096 окт  9 09:18 ../
-rw-r--r--  1 maxim maxim  156 окт  9 09:18 part-00000
-rw-r--r--  1 maxim maxim   12 окт  9 09:18 .part-00000.crc
-rw-r--r--  1 maxim maxim    0 окт  9 09:18 _SUCCESS
-rw-r--r--  1 maxim maxim    8 окт  9 09:18 ._SUCCESS.crc

File 2.xml/part-00000 has the following content:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

But it does not have .xml extension. Is that an expected behavior?

srowen commented 10 months ago

It's expected. I don't know of a way to control this, and won't change it at this point (the library is now in Spark anyway)

dolfinus commented 10 months ago

I see, rdd.saveAsTextFile creates directory with files without extensions. I think it is worth mentioning in Readme.