Loading/Failure indicator in Lineage UI

DaimonPl commented 4 years ago

Bigger lineages may take several seconds to load

It would be good if UI would:

display loading indicator when UI is awaiting XHR
display error on actual error or timeout that lineage could not be loaded

It's for both whole graph and details

wajda commented 4 years ago

We'll implement this in the scope of the new UI that we plan to release in ver 0.7

Speaking about the performance, can you share some stats of your lineage data? E.g.

how many recorded executions in the database
number of operations in an average execution plan
number of columns in datasets
do you use overwrites or appends? (and how many appends between overwrites per source?)
how many data sources (URIs) are in the DB?
how long are you data pipelines (how many jobs are in the longest data pipeline, from the very first input to the very last output?)
are your graphs rather deep or wide?
which requests are the slowest? (lineage-overview or lineage-detailed?)

Understanding the use case and lineage patterns would help us to optimize the persistence, queries and find more effective scalability models.

Thanks a lot.

DaimonPl commented 4 years ago

@wajda regarding internal data metrics i could run some queries on arangodb if you have them - currently I have no idea about internals so it's hard for me to answer those questions

Currently I enabled spline for 1 project with daily data retention (that is, spline and arangodb are cleaned completely everyday)

Here's what main lineage overview graph looks like

For that graph 'lineage-overview' returns 28.7kB in 3.5 seconds - it's not super long but already noticable - especially without "loading" indicator. I'm on VPN from home though so this might add some delay too.

It's single project, some of input datasources may be similar size but they are not yet enabled with spline

Biggest lineage-detailed after clicking on graph returns 265kB in 0.7 seconds. 265kB is really a lot especially that only list of input/output URIs is displayed. Looks like endpoints could be somehow specialized to return less data for such cases

wajda commented 4 years ago

Wow that's awesome! :) That would be really helpful to get more precises statistics.

Detailed lineage Yes there is a room for JSON size optimization. But I think being gzipped it shouldn't grow that fast, so it's not that bad I would say. What can blow this JSON up however is crazy wide datasets (thousands of columns of complex types). That's what requires some extra thoughts IMO.

Lineage overview The biggest challenge with that one is the graph traversing, especially if there are many appends to the data sources. To display correct lineage we select all visible reads from every write perspective, and traverse each route recursively, and that is expensive. In future we'll add asynchronous pre-linking on background to move some work away from the user request-time. Another precaution that we have implemented for possible combinatorial explosion is a max graph depth threshold. Currently the depth == 10 is hardcoded on the UI (meaning 10 dependent job in line). In the UI v0.5.2 we added a button to increase this depth on demand if the threshold is reached. We also plan to implement more sensible graph navigation mechanism in the future versions.

wajda commented 4 years ago

I'll send you the queries, thanks! Just for curiosity, is it any sort of ML data pipeline? I see cyclic dependencies between jobs, so I wonder.

DaimonPl commented 4 years ago

It's not pure ML but it's a bit more complex data processing pipeline. There are no thousands of columns here for sure :)

There should be no cyclic dependencies. I think there are 3 problems which make graph reading difficult:

datasources are sometimes used by multiple jobs (also for both direct and transitive dependency, for example A -> B -> C and A -> C)
lack of arrows https://github.com/AbsaOSS/spline/issues/686
graph is not topologically sorted - some jobs from very beggining of pipeline are at the bottom of graph near final output sources https://github.com/AbsaOSS/spline/issues/718

DaimonPl commented 4 years ago

For graph view itself i've created https://github.com/AbsaOSS/spline/issues/718 to make it more clear

wajda commented 4 years ago

@DaimonPl can you run this AQL on your biggest DB and share the result?

RETURN {
    "operations"    :   LENGTH(operation),
    "dataSource"    :   LENGTH(dataSource),
    "exec-plans"    :   LENGTH(executionPlan),
    "exec-events"   :   LENGTH(progress),
    "appends"       :   LENGTH(operation[* FILTER CURRENT._type == "Write" AND CURRENT.append]),
    "unique-apps"   :   LENGTH(UNIQUE(executionPlan[*].extra.appName)),
    "top-io-per-ds" :   (
                            LET pairs = (
                                FOR ds IN dataSource
                                    FOR op IN 1 INBOUND ds writesTo, readsFrom
                                        COLLECT t = op._type == "Read" ? "reads"
                                                        : op.append ? "appends"
                                                        : "overwites",
                                                dsId = ds._key WITH COUNT INTO c
                                        SORT c DESC
                                        RETURN [t, c]
                            )
                            FOR p IN pairs
                                COLLECT t = p[0] into g
                                RETURN [t, g[* LIMIT 20].p[1]]
                        )
}

DaimonPl commented 4 years ago

Sure, but I'll be able to do it next Monday (holidays :) and no access to company network :) )

wajda commented 4 years ago

Sure. Happy holidays :)

And this one as well please:

RETURN {
    "top-longest-observed-writes-seqs" : (
        FOR p IN progress
            LET cnt = LENGTH(SPLINE::OBSERVED_WRITES_BY_READ(p))
            FILTER cnt > 0
            SORT cnt DESC
            LIMIT 20
            RETURN cnt
    )
}

DaimonPl commented 4 years ago

@wajda I'll gather stats tomorrow. Today we'll try to enable additional projects with spline, this should give better stats

DaimonPl commented 4 years ago

Ok, so here are stats for two bigger processing projects and several medium/smaller. It's still only part of entire data processing but spline is not yet enabled everywhere (also included projects have datasource dependencies which themselves may have quite big graphs but theya re not yet in spline)

here are results of queries (db still from spline 0.5.1)

[
  {
    "operations": 36435,
    "dataSource": 362,
    "exec-plans": 412,
    "exec-events": 412,
    "appends": 0,
    "unique-apps": 131,
    "top-io-per-ds": [
      [
        "overwites",
        [
          12,
          12,
          12,
          12,
          11,
          11,
          11,
          10,
          5,
          3,
          3,
          3,
          3,
          3,
          3,
          2,
          2,
          2,
          2,
          2
        ]
      ],
      [
        "reads",
        [
          1836,
          169,
          117,
          108,
          83,
          68,
          50,
          47,
          47,
          46,
          44,
          42,
          36,
          34,
          34,
          34,
          34,
          32,
          29,
          28
        ]
      ]
    ],
    "top-longest-observed-writes-seqs": [
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10,
      10
    ]
  }
]

AbsaOSS / spline-ui

Loading/Failure indicator in Lineage UI #114