apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.94k stars 980 forks source link

DRILL-8407: Add Support for SFTP File Systems #2770

Closed cgivre closed 1 year ago

cgivre commented 1 year ago

DRILL-8407: Add Support for SFTP File Systems

Description

This PR enables Drill to query files stored in SFTP file systems.

Documentation

An SFTP file system behaves exactly as any other file system.

Configuration

To query data from an SFTP file system, follow the instructions for any other file system. For the URL, provide the host as shown below:

{
  "type": "file",
  "connection": "sftp://<your sftp server URL>",
  "workspaces": {
    "test": {
      "location": "<path to test data>",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
  ...

Authentication

The SFTP plugin requires a username and password to authenticate. The best way to do this is to provide the information via a credentialProvider as shown below. SFTP file systems can be used with USER_TRANSLATION enabled, but not USER_IMPERSONATION.

 "credentialsProvider": {
    "credentialsProviderType": "PlainCredentialsProvider",
    "credentials": {
      "username": "<username>",
      "password": "<password>"
    },
    "userCredentials": {}
  },

If you need to pass additional configuration variables to the SFTP server, you can do so in the config parameter in the file system. You will need to prefix any parameters with fs.sftp.

Testing

Manually Tested