cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.5k stars 1.74k forks source link

Creare possbility to connect to Dremio Cloud Driver #8166

Open bruno-castro-precision opened 2 months ago

bruno-castro-precision commented 2 months ago

Dremio has evolved on past years, adopting project and catalog perspectives such as a lakehouse platform. This create a need for a new connector in cube that help to connect to Dremio Cloud, using the new and improved API's.

image

Dremio Cloud has a different documentation over Dremio Core and concepts such as username and database are not used in this solution. Also some ports are a bit different if compared to Dremio Core.

Describe the solution you'd like There is a need for changing the Dremio Driver and Dremio Query components to adjust to new dremio capabilities. On older versions of Dremio Core there was a username that was used to authenticate on APIs. This information does not exists for Dremio Cloud, and the connection to Dremio SQL API and Dremio Jobs API is authenticated by project id and PAT.

Here is a snippet of code using python language for the newest jobs API:

def make_dremio_call(
        self,
        query: str,
        token: str
    ) -> DataFrame:
    '''
    This method makes a query on dremio's warehourse and and returns a DataFrame

    Args:
    - query (str): The query to be processed by dremio
    - token: the personal access token provided by dremio

    Returns:
    - df (DataFrame): a dataframe with the result of the query
    '''
    if self.dremio_project_id == '':
        raise ValueError('Please set dremio_project_id parameter on APICaller class.')
    #Trigger the job to run
    endpoint = f'v0/projects/{self.dremio_project_id}/sql'
    headers = {
        'Authorization': f'Bearer {token}',
        'Content-Type': 'application/json'
    }
    payload = {
        'sql':query
    }

    payload = json.dumps(payload)

    job_id = requests.post(
        url = self.dremio_uri + endpoint,
        headers = headers,
        data = payload,
    ).json()['id']

As you can see, the only keys used are project_id and personal_access_token.

Describe alternatives you've considered The most interesting way to fix this problem is updating the code related to driver and query components adapting it to the newest docuementation of Dremio Cloud.

Additional context [Older API Documentation - Currently used in Cube Cloud]() Newest API Documentation for Dremio Cloud

github-actions[bot] commented 2 months ago

If you are interested in working on this issue, please provide go ahead and provide PR for that. We'd be happy to review it and merge it. If this is the first time you are contributing a Pull Request to Cube, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube Slack.

igorlukanin commented 2 months ago

Hi @bruno-castro-precision 👋

Thanks for raising this! Would you like to contribute these changes yourself? In that case, these instructions might be helpful: https://github.com/cube-js/cube/blob/master/CONTRIBUTING.md#contributing-database-drivers