apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 980 forks source link

it cause an exception when query http plugin data with fullName #2851

Open shfshihuafeng opened 1 year ago

shfshihuafeng commented 1 year ago

Before submitting a bug report, please verify that you are using the most current version of Drill.

Describe the bug i create a http plugin named "http" and then use "use http" command to switch schema. i query http Data with "select * from http.sunrise",It report except

To Reproduce Steps to reproduce the behavior:

1. create plugin with name "http"
2. use http
3. select * from http.sunrise;

Expected behavior

apache drill> select * from http.sunrise;
+----------------------------------------------------------------------------------+--------+
|                                     results                                      | status |
+----------------------------------------------------------------------------------+--------+
| {"sunrise":"5:42:47 AM","sunset":"5:52:15 PM","solar_noon":"11:47:31 AM","day_length":"12:09:28","civil_twilight_begin":"5:21:47 AM","civil_twilight_end":"6:13:15 PM","nautical_twilight_begin":"4:56:00 AM","nautical_twilight_end":"6:39:02 PM","astronomical_twilight_begin":"4:30:08 AM","astronomical_twilight_end":"7:04:54 PM"} | OK     |
+----------------------------------------------------------------------------------+--------+
1 row selected (2.967 seconds)

Error detail, log output or screenshots Error: CONNECTION ERROR: API 'http' does not exist in HTTP storage plugin 'http'

Drill version 1.22.0

Additional context

http plugin config
{
  "type": "http",
  "connections": {
    "sunrise": {
      "url": "https://api.sunrise-sunset.org/json",
      "requireTail": false,
      "method": "GET",
      "params": [
        "lat",
        "lng",
        "date"
      ],
      "authType": "none",
      "inputType": "json",
      "xmlDataLevel": 1,
      "postParameterLocation": "QUERY_STRING",
      "verifySSLCert": true
    }
  },
  "timeout": 5,
  "retryDelay": 1000,
  "proxyType": "direct",
  "authMode": "SHARED_USER",
  "enabled": true
}
shfshihuafeng commented 1 year ago

we should determines whether the parameter name equals the plug-in name ?

--- a/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpSchemaFactory.java
+++ b/contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpSchemaFactory.java
@@ -18,8 +18,11 @@
 package org.apache.drill.exec.store.http;

 import java.util.Collections;
+import java.util.Locale;
 import java.util.Map;
 import java.util.Map.Entry;
+import java.util.Set;
+import java.util.stream.Collectors;

 import org.apache.calcite.schema.SchemaPlus;
 import org.apache.calcite.schema.Table;
@@ -85,7 +88,7 @@ public class HttpSchemaFactory extends AbstractSchemaFactory {
       HttpAPIConnectionSchema subSchema = subSchemas.get(name);
       if (subSchema != null) {
         return subSchema;
-      } else if (tables.containsKey(name)) {
+      } else if (tables.containsKey(name) || isRegistryPluginName(name.toLowerCase(Locale.ROOT))) {
         return null;
       } else {
         throw UserException
@@ -95,6 +98,12 @@ public class HttpSchemaFactory extends AbstractSchemaFactory {
       }
     }

+    private boolean isRegistryPluginName(String name) {
+      Set<String> pluginNames = plugin.getRegistry().availablePlugins();
+      Set<String> pluginNamesToLower = pluginNames.stream().map(String::toLowerCase).collect(Collectors.toSet());
+      return pluginNamesToLower.contains(name);
+    }
cgivre commented 1 year ago

@shfshihuafeng I don't think this is a bug. You are first running the query USE http which sets the root at http. Then you're running a SELECT ... FROM http.sunrise. So at that point, the plugin is looking for a path: http.http.sunrise which does not exist.

I'd bet that if you either skipped the USE query or ran a SELECT ... FROM sunrise it would work.

Javelin2007 commented 1 year ago

@cgivre You are right, but when use http, the default schema is http, when we use another data source schema, will report error. image It think this is not right.

shfshihuafeng commented 1 year ago

@cgivre image

shfshihuafeng commented 1 year ago

@cgivre when i use "use my.test" command to switch schema, It is correct to query data with full schema Name from other data sources 0880c4c80710fc8ae113

cgivre commented 12 months ago

@shfshihuafeng So are you saying the behavior with USE is inconsistent? I want to make sure I understand the issue.

shfshihuafeng commented 11 months ago

@cgivre There are two problems with this

  1. behavior with USE is inconsistent
  2. when i enter the scehma with 'use',data can not be queried image
niemipt commented 1 month ago

I think that I have exactly the same problem. It seems that after you change schema to http storage schema you cannot run any SQL.

➜ tmp cat bootstrap-storage-plugins.json
{ "storage": { "http": { "type": "http", "cacheResults": false, "enabled": true, "timeout": 5, "connections": { "sunrise": { "url": "https://api.sunrise-sunset.org/json", "requireTail": false, "method": "GET", "dataPath": "results", "headers": null, "params": [ "lat", "lng", "date" ], "authType": "none", "userName": null, "password": null, "postBody": null } } } } } ➜ tmp docker run -it --rm --name drill -v "${HOME}"/tmp/bootstrap-storage-plugins.json:/opt/drill/conf/bootstrap-storage-plugins.json apache/drill WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested Apache Drill 1.21.1 "Drill must go on." apache drill> !schemas +--------------------+---------------+ | TABLE_SCHEM | TABLE_CATALOG | +--------------------+---------------+ | cp.default | DRILL | | dfs.default | DRILL | | dfs.root | DRILL | | dfs.tmp | DRILL | | http | DRILL | | information_schema | DRILL | | sys | DRILL | +--------------------+---------------+ apache drill> SELECT version FROM sys.version; +---------+ | version | +---------+ | 1.21.1 | +---------+ 1 row selected (2.638 seconds) apache drill> SELECT sunrise, sunset FROM http.sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; +------------+------------+ | sunrise | sunset | +------------+------------+ | 6:32:42 AM | 5:30:55 PM | +------------+------------+ 1 row selected (2.506 seconds) apache drill> use http; +------+----------------------------------+ | ok | summary | +------+----------------------------------+ | true | Default schema changed to [http] | +------+----------------------------------+ 1 row selected (0.219 seconds) apache drill (http)> SELECT sunrise, sunset FROM sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; Error: CONNECTION ERROR: API 'http' does not exist in HTTP storage plugin 'http'

[Error Id: 574dff2e-53ac-47c8-a4dc-f4368f5d60ea ] (state=,code=0) apache drill (http)> SELECT sunrise, sunset FROM http.sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; Error: CONNECTION ERROR: API 'http' does not exist in HTTP storage plugin 'http'

[Error Id: ff623dec-3ecc-4eed-9caa-bad804694ea9 ] (state=,code=0) apache drill (http)> SELECT version FROM sys.version; Error: CONNECTION ERROR: API 'sys' does not exist in HTTP storage plugin 'http'

[Error Id: fd4309e5-8883-4219-b0b7-3beec7147b96 ] (state=,code=0) apache drill (http)> use sys; Error: CONNECTION ERROR: API 'sys' does not exist in HTTP storage plugin 'http'

[Error Id: 08a3b91a-a216-4088-8fc2-e0bc0d79bfc1 ] (state=,code=0) apache drill (http)> !schemas +--------------------+---------------+ | TABLE_SCHEM | TABLE_CATALOG | +--------------------+---------------+ | cp.default | DRILL | | dfs.default | DRILL | | dfs.root | DRILL | | dfs.tmp | DRILL | | http | DRILL | | information_schema | DRILL | | sys | DRILL | +--------------------+---------------+ apache drill (http)> !reconnect Reconnecting to "jdbc:drill:zk=local"... apache drill> SELECT version FROM sys.version; +---------+ | version | +---------+ | 1.21.1 | +---------+ 1 row selected (0.24 seconds) apache drill> !quit ➜ tmp

Only exception is that if you name HTTP storage and connection with same name (eg. schema sunrise and connection sunrise) you can query data using both schema.connection and connection but you still can't query any other schema using anotherschema.connection.

➜ tmp cat bootstrap-storage-plugins.json
{ "storage": { "sunrise": { "type": "http", "cacheResults": false, "enabled": true, "timeout": 5, "connections": { "sunrise": { "url": "https://api.sunrise-sunset.org/json", "requireTail": false, "method": "GET", "dataPath": "results", "headers": null, "params": [ "lat", "lng", "date" ], "authType": "none", "userName": null, "password": null, "postBody": null } } } } } ➜ tmp docker run -it --rm --name drill -v "${HOME}"/tmp/bootstrap-storage-plugins.json:/opt/drill/conf/bootstrap-storage-plugins.json apache/drill WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested Apache Drill 1.21.1 "Say hello to my little Drill." apache drill> !schemas +--------------------+---------------+ | TABLE_SCHEM | TABLE_CATALOG | +--------------------+---------------+ | cp.default | DRILL | | dfs.default | DRILL | | dfs.root | DRILL | | dfs.tmp | DRILL | | information_schema | DRILL | | sunrise | DRILL | | sys | DRILL | +--------------------+---------------+ apache drill> SELECT version FROM sys.version; +---------+ | version | +---------+ | 1.21.1 | +---------+ 1 row selected (2.235 seconds) apache drill> SELECT sunrise, sunset FROM sunrise.sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; +------------+------------+ | sunrise | sunset | +------------+------------+ | 6:32:42 AM | 5:30:55 PM | +------------+------------+ 1 row selected (2.545 seconds) apache drill> use sunrise; +------+-------------------------------------+ | ok | summary | +------+-------------------------------------+ | true | Default schema changed to [sunrise] | +------+-------------------------------------+ 1 row selected (0.188 seconds) apache drill (sunrise)> SELECT sunrise, sunset FROM sunrise.sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; +------------+------------+ | sunrise | sunset | +------------+------------+ | 6:32:42 AM | 5:30:55 PM | +------------+------------+ 1 row selected (1.146 seconds) apache drill (sunrise)> SELECT sunrise, sunset FROM sunrise WHERE lat = 36.7201600 AND lng = -4.4203400 AND date = 'today'; +------------+------------+ | sunrise | sunset | +------------+------------+ | 6:32:42 AM | 5:30:55 PM | +------------+------------+ 1 row selected (1.143 seconds) apache drill (sunrise)> SELECT version FROM sys.version; Error: CONNECTION ERROR: API 'sys' does not exist in HTTP storage plugin 'sunrise'

[Error Id: 8cb31758-252b-4949-a30e-100b469d2476 ] (state=,code=0) apache drill (sunrise)> use sys; Error: CONNECTION ERROR: API 'sys' does not exist in HTTP storage plugin 'sunrise'

[Error Id: 3d1e62b3-f395-4509-9001-5a47376e6063 ] (state=,code=0) apache drill (sunrise)> !schemas +--------------------+---------------+ | TABLE_SCHEM | TABLE_CATALOG | +--------------------+---------------+ | cp.default | DRILL | | dfs.default | DRILL | | dfs.root | DRILL | | dfs.tmp | DRILL | | information_schema | DRILL | | sunrise | DRILL | | sys | DRILL | +--------------------+---------------+ apache drill (sunrise)> !quit ➜ tmp