apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.93k stars 980 forks source link

dataPath expression for http-storage-plugin #2792

Open oleksandrborniak opened 1 year ago

oleksandrborniak commented 1 year ago

Hello Guys, Could you please help me to create the correct “dataPath” for Drill http-storage-plugin? I have a JSON response like:

{
    "1": {
        "id": 1,
        "employeeId": 1,
        "firstName": "John",
        "lastName": "Doe",
        "email": "john.doe@bamboohr.com",
        "status": "enabled",
        "lastLogin": "2011-03-19T10:16:00+00:00"
    },
    "2": {
        "id": 2,
        "firstName": "Jane",
        "lastName": "Doe",
        "email": "jane.doe@bamboohr.com",
        "status": "enabled",
        "lastLogin": "2011-08-29T11:17:43+00:00"
    }
}

https://documentation.bamboohr.com/reference/get-a-list-of-users-1 And I need that only field id, employeeId, firstName, lastName, email, status, lastLogin returns.

I have tried to use the following configuration, but it doesn’t return any data. "dataPath": "$.*.{id, employeeId, firstName, lastName, email, status, lastLogin}" Can I use some expressions to return only required fields?

oleksandrborniak commented 1 year ago

I mean, I want to use "dataPath" to transform

      "1": {
        "id": 1,
        "employeeId": 1,
        "firstName": "John",
        "lastName": "Doe",
        "email": "john.doe@bamboohr.com",
        "status": "enabled",
        "lastLogin": "2011-03-19T10:16:00+00:00"
    }

to the

        {
        "id": 1,
        "employeeId": 1,
        "firstName": "John",
        "lastName": "Doe",
        "email": "john.doe@bamboohr.com",
        "status": "enabled",
        "lastLogin": "2011-03-19T10:16:00+00:00"
    }
oleksandrborniak commented 1 year ago

@vvysotskyi Volodymyr -I am not sure, but I might use Schema for HTTP Plugin to achieve it, or possibly you can suggest another workaround of how it can be resolved. I will be sincerely grateful for your help.

vvysotskyi commented 1 year ago

I'm not sure that Drill can handle it properly. If it would be a non-root column, it could be possible to try using the DICT type in the provided schema. I see that the API can return an XML response. Could you try using it?

oleksandrborniak commented 1 year ago

I tried to use XML version, but it doesn’t work as I expect. This is my http config:

{
  "type": "http",
  "connections": {
    "bamboohr": {
      "url": "https://api.bamboohr.com/api/gateway.php/company/v1/meta/users/",
      "requireTail": false,
      "method": "GET",
      "headers": {
        "Authorization": "Basic  API_KEY",
        "accept": "application/json"
      },
      "authType": "none",
      "inputType": "json",
      "xmlDataLevel": 2,
      "postParameterLocation": "QUERY_STRING",
      "verifySSLCert": true
    }
  },
  "timeout": 15,
  "retryDelay": 1000,
  "proxyType": "direct",
  "authMode": "SHARED_USER",
  "enabled": true
}

XML looks like:

<users>
    <user id="1" employeeId="1">
        <firstName>John</firstName>
        <lastName>Doe</lastName>
        <email>john.doe@bamboohr.com</email>
        <lastLogin>2011-03-19T10:16:00+00:00</lastLogin>
    </user>
    <user id="2">
        <firstName>Jane</firstName>
        <lastName>Doe</lastName>
        <email>jane.doe@bamboohr.com</email>
        <lastLogin>2011-08-29T11:17:43+00:00</lastLogin>
    </user>
</users>

But I am getting “attributes” column instead of “user id” or “id”. Do you have and idea of how I can fix it? Or might be we can apply some workaround?

Screenshot 2023-04-18 at 10 52 30

I can change “xmlDataLevel” from 2 to 1 in http config file, but it doesn’t look better.

Screenshot 2023-04-18 at 10 54 00
cgivre commented 1 year ago

Hi there, If you look at your data, you'll see .

The id and employeeId are attributes of the user tag, so it is expected behavior to see these fields in the attributes field. Best, -- C

On Apr 18, 2023, at 3:54 AM, oleksandrborniak @.***> wrote:

I tried to use XML version, but it doesn’t work as I expect. This is my http config:

{ "type": "http", "connections": { "bamboohr": { "url": "https://api.bamboohr.com/api/gateway.php/company/v1/meta/users/", "requireTail": false, "method": "GET", "headers": { "Authorization": "Basic API_KEY", "accept": "application/json" }, "authType": "none", "inputType": "json", "xmlDataLevel": 2, "postParameterLocation": "QUERY_STRING", "verifySSLCert": true } }, "timeout": 15, "retryDelay": 1000, "proxyType": "direct", "authMode": "SHARED_USER", "enabled": true } XML looks like:

John Doe ***@***.*** 2011-03-19T10:16:00+00:00 Jane Doe ***@***.*** 2011-08-29T11:17:43+00:00

But I am getting “attributes” column instead of “user id” or “id”. Do you have and idea of how I can fix it? Or might be we can apply some workaround?

https://user-images.githubusercontent.com/49722408/232709810-d70c2599-f49a-461c-a15f-091ae2236c5f.png I can change “xmlDataLevel” from 2 to 1 in http config file, but it doesn’t look better.

https://user-images.githubusercontent.com/49722408/232710027-c9520214-47e7-443e-8f60-b9e13028adcd.png — Reply to this email directly, view it on GitHub https://github.com/apache/drill/issues/2792#issuecomment-1512623385, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKB7PQJQXZD747B5XBH3ETXBZCERANCNFSM6AAAAAAW4A7VHE. You are receiving this because you are subscribed to this thread.