CODAIT / stocator

Stocator is high performing connector to object storage for Apache Spark, achieving performance by leveraging object storage semantics.
Apache License 2.0
112 stars 72 forks source link

Stocator is unable to list objects in ceph storage #252

Closed Surya-Penumatcha closed 4 years ago

Surya-Penumatcha commented 4 years ago

I went through two cases (1) connecting to a COS instance on IBM Cloud (2) connecting to ceph storage setup.

  1. In case (1) I observed that when stocator makes the GET request (/suryas-playground-cos-iae-test/?prefix=surya01%2F&max-keys=5000&encoding-type=url) to COS asking to list objects (line 890 COSAPIClient.java), the returned results contains key in the format:
    <Key>surya01/part-00001-attempt_20200520153504_0006_m_000001_0</Key> # Notice the "/" after the name surya01
  2. In case (2) when stocator makes the GET request (/ceph-hadoop-bucket/?prefix=mrudula2021%2F&max-keys=5000&encoding-type=url) to CEPH asking to list objects, the returned result contains key in the format:
    <Key>mrudula2021%2Fpart-00000-attempt_20200421091301_0001_m_000000_0</Key> # Notice the "%2F" after the name mrudula2021. It is returned as an encoded value which I think is causing the issue.

    From case (2), in a followup GET request this mrudula2021%2F (%2F is the encoding for /) is being encoded again to mrudula2021%252F (% -> %25)

Surya-Penumatcha commented 4 years ago

I created a PR https://github.com/CODAIT/stocator/pull/250 to resolve this issue.