apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 979 forks source link

DATA_WRITE ERROR: No FileSystem for scheme "null" #2940

Closed egasimov closed 2 months ago

egasimov commented 2 months ago

Hello Drill community, Recently, we have encountered with the following issue(DATA_WRITE ERROR: No FileSystem for scheme "null"), when writing query result into S3 through Apache drill.

We have checked two ways.

  1. Write data thorugh s3 storage plugin configured from Apache drill web UI.
  2. Mounted to VM's local directory via s3fs protocol. and getting the above exception: <No FileSystem for scheme "null">

P:S We are using Flashblade//S3 as hardware. Workaround solution for now, to write the data to local VM's /tmp directory, and later move them to mounted(through s3fs) directory.

Executed queries:

CREATE TABLE dfs.root.`/datas3/joined-data/join_data.parquet` 
AS 
(
    SELECT *
    FROM dfs.root.`/datas3/customers/*` d
);

CREATE TABLE s3.root.`/datas3/joined-data/join_data.parquet` 
AS 
(
    SELECT *
    FROM dfs.root.`/datas3/customers/*` d
);

Error details:

-------------------------------------------------------------------------------------------------------
org.apache.drill.common.exceptions.UserRemoteException: DATA_WRITE ERROR: No FileSystem for scheme "null"

Failure when writing the batch
Fragment: 0:0

[Error Id: bbecb92a-9fad-4774-b3c6-9d940150883a on 8a288c59e4a7:31010]
------------------------------------------------------------------------

Drill version Apache drill version: 1.21.2

Additional context Configured storage plugins s3.json dfs.json

Test data You may download data as parquet file from here

[
    {
    "customer_id": 1000001,
    "customer_name": "John Doe",
    "purchased_items": [
        {
            "item_id": 2000001,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-06-12T11:36:37.751Z"
        },
        {
            "item_id": 2000002,
            "item_class": "B",
            "product_id": 888,
            "created_at": "2024-06-12T08:46:37.751Z"
        },
        {
            "item_id": 2000003,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 999,
            "created_at": "2024-06-12T11:56:00.751Z"
        }
    ]
    },
    {
    "customer_id": 1000002,
    "customer_name": "Black Smith",
    "purchased_items": [
        {
            "item_id": 2000004,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-06-12T11:36:33.751Z"
        },
        {
            "item_id": 2000006,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 999,
            "created_at": "2024-08-12T11:56:37.751Z"
        }
    ]
    },
        {
    "customer_id": 1000003,
    "customer_name": "Alice Doe",
    "purchased_items": [
        {
            "item_id": 2000010,
            "item_class": "A",
            "product_id": 777,
            "created_at": "2024-01-12T11:36:37.751Z"
        },
        {
            "item_id": 2000011,  # Corrected item_id for uniqueness
            "item_class": "C",
            "product_id": 888,
            "created_at": "2024-04-12T11:56:37.751Z"
        }
    ]
    }
]
cgivre commented 2 months ago

Have you tried

CREATE TABLE s3.root.`/datas3/joined-data/join_data`
... 
egasimov commented 2 months ago

@cgivre yes, we have checked it as well.

In s3 storage plugin, after setting "writable" param from / to /any-subdirectory, and then it worked well.

  "workspaces": {
    "root": {
      "location": "/save",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },