aicoe-aiops / ocp-ci-analysis

Developing AI tools for developers by leveraging the data made openly available by OpenShift and Kubernetes CI platforms.
https://old.operate-first.cloud/data-science/ai4ci/
GNU General Public License v3.0
33 stars 72 forks source link

Bucket accessible from Superset to host data. #123

Open martinpovolny opened 3 years ago

martinpovolny commented 3 years ago

As a Data Scientists, I want r/w access to a bucket, which is connected to superset so that I can visualize data in that bucket in a Superset dashboard.

Acceptance Criteria

martinpovolny commented 3 years ago

/assign @martinpovolny

martinpovolny commented 3 years ago

Status update

We already have a bucket for the project, created here: https://github.com/aicoe-aiops/ocp-ci-analysis/pull/111 and here: https://github.com/aicoe-aiops/ocp-ci-analysis/pull/115

Unfortunately, if we are to use it also with workflows we need to have a stable name, therefore it's being reamed here: https://github.com/aicoe-aiops/ocp-ci-analysis/pull/131

Bucket access

This bucket is accessible using credentials that are stored in a configmap and a secret named the same as the bucket claim. In the same project (this app's project). This workes (tested).

Permissions are set for the DS group so workflows and people can use these to access the bucket.

I have not tested if we can access the bucket from Superset. There might be a different bucket that is pre-configured in superset and hue, we had a discussion on this with @tumido : https://github.com/operate-first/support/issues/23#issuecomment-776030397

Documentation

Here's a documentation issue for the bucket use https://github.com/operate-first/support/issues/48 Here are the steps needed to create a bucket for a project: https://github.com/operate-first/support/issues/48#issuecomment-768932522

TODO: Do we have some doc for accessing the buckets from superset and hue? (should be on the OperateFirst site)

Doc on accessing superset and hue (passwords) is here: https://www.operate-first.cloud/users/support/

Other related information:

There's also some S3 interface provided by MOC mentioned here: https://github.com/open-infrastructure-labs/ops-issues/issues/33

hemajv commented 3 years ago

In order to create dashboards in Superset, the workflow we have followed in the past is:

store data in Ceph bucket -> create table in Hue for this data -> use the table in Superset to create dashboards

So we would also require Hue to have access to the bucket i.e. the s3 connection needs to be setup in Hue so that we can create tables for the data stored in the bucket. Currently, however there seems to be some issues due to which we are unable to create the tables in Hue, see issue: https://github.com/operate-first/support/issues/131