aicoe-aiops / ocp-ci-analysis

Developing AI tools for developers by leveraging the data made openly available by OpenShift and Kubernetes CI platforms.
https://old.operate-first.cloud/data-science/ai4ci/
GNU General Public License v3.0
33 stars 72 forks source link

Timestamp columns show as NULL on Trino #344

Closed aakankshaduggal closed 3 years ago

aakankshaduggal commented 3 years ago

In order to visualize the KPI metrics on Superset Dashboard, we are trying to link the metrics data from ceph using cloudbeaver, but however, when we do so the timestamp column appears to be null.

We have tried this so far : Approach 1 - converting it into milliseconds format as suggested here.

For example, in this dataset, we converted the datetime column to a datetime with milliseconds - image

When we try to run this query on cloudbeaver - image

We see nothing.

The table run was a success image

and this is how the final dataframe on cloudbeaver looks like where "timestamp-ms" shows as NULL datatype- image Approach 2 : The timestamp column wasn't datetime64, it was "object". We converted it into datetime64, still gives a null.

Any other approach/suggestions are welcome.

cc: @oindrillac @hemajv @chauhankaranraj @Shreyanand @isabelizimm @MichaelClifford

MichaelClifford commented 3 years ago

Is timestamp a supported data type? did you try just varchar in the table definition?

MichaelClifford commented 3 years ago

What kind of object is the "time_stamp" column in the stored dataframe?

oindrillac commented 3 years ago

Is timestamp a supported data type? did you try just varchar in the table definition?

it should be supported https://trino.io/docs/current/language/types.html#timestamp

oindrillac commented 3 years ago

What kind of object is the "time_stamp" column in the stored dataframe?

its currently datetime64

MichaelClifford commented 3 years ago

what if you make it a string, then set it as Timestamp on the schema?

oindrillac commented 3 years ago

what if you make it a string, then set it as Timestamp on the schema?

On saving the dataframe timestamp-ms column as "str" at opf-datacatalog/trino-db-str/number_of_flakes image

the create table runs succesfully image

however, I see nothing when I select rows from the table

image

MichaelClifford commented 3 years ago

One other thing to try. Define it's datatype as varchar for cloudbever schema, then tell superset its your timestamp field. I think that's how we handled it with the earlier implementation.

chauhankaranraj commented 3 years ago

It seems like this issue can be resolved by making sure the column names used in the CREATE TABLE statement are the same exact as those in the parquet files backing the table. Because by default, hive reads parquet columns by name and not ordinal position (see docs).

So in this particular case, changing time_stamp to timestamp in the sql query above should fix the NULLs.