apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.54k stars 196 forks source link

Job graph fails to display in UI #627

Open hokiegeek2 opened 1 year ago

hokiegeek2 commented 1 year ago

Describe the bug Clicking on the view graph icon...

image

...fails to display the job graph:

image

To Reproduce

  1. click on view graph icon

Expected behavior DAG of job task sequence

onthebridgetonowhere commented 1 year ago

I worked on this some time ago, so I'd be happy to have a look at it. @hokiegeek2 - does it fail at any job or specific jobs?

hokiegeek2 commented 1 year ago

@onthebridgetonowhere Cool! Basically, no job graphs display in the UI in my experience

onthebridgetonowhere commented 1 year ago

I had a look at it but I cannot reproduce the problem. Can you please provide more info about your configuration, version, build commit? The first thing I'd try is to update to latest commit on main branch if possible.

Here's what I've done that works for me:

hokiegeek2 commented 1 year ago

@onthebridgetonowhere cool, okay, I'll pull and build from that commit and see if I can reproduce your results

onthebridgetonowhere commented 1 year ago

@hokiegeek2 sounds good, let us know how it goes.

simicd commented 9 months ago

Hi all, while working on #957 I faced the same issue, here's how to reproduce it:

  1. In the API handler, get_job_svg_graph function replace .map_err(|_| warp::reject()) with .map_err(|err| { info!("Issue: {:?}", err); warp::reject()}) https://github.com/apache/arrow-ballista/blob/9903ab27f121f16717a70de7a30f643c4c45dd34/ballista/scheduler/src/api/handlers.rs#L343-L348
  2. Launch scheduler and executor
  3. Submit a query, e.g. the standalone-sql.rs but with a call to remote:

    
    use ballista::prelude::{BallistaConfig, BallistaContext, Result};
    use ballista_examples::test_util;
    use datafusion::execution::options::ParquetReadOptions;
    
    #[tokio::main]
    async fn main() -> Result<()> {
        let config = BallistaConfig::builder()
            .set("ballista.shuffle.partitions", "1")
            .build()?;
    
        let ctx = BallistaContext::remote("localhost", 50050, &config).await?;
    
        let testdata = test_util::examples_test_data();
    
        // register parquet file with the execution context
        ctx.register_parquet(
            "test",
            &format!("{testdata}/alltypes_plain.parquet"),
            ParquetReadOptions::default(),
        )
        .await?;
    
        let df = ctx.sql("select count(1) from test").await?;
    
        df.show().await?;
        Ok(())
    }
  4. Wait until query is executed
  5. Look up the job ID, e.g. GET request to http://localhost:50050/api/jobs
  6. Call http://localhost:50050/api/job/{job_id}/dot_svg with the retrieved job_id (e.g. http://localhost:50050/api/job/bvppZ4r/dot_svg)
  7. Go to the terminal of the scheduler and monitor the logs - I got the following error:
    2024-01-25T22:28:24.800607Z  INFO tokio-runtime-worker ThreadId(19) ballista_scheduler::api::handlers: Issue: Error { kind: NotFound, message: "program not found" }

Hope that helps to narrow it down!

simicd commented 8 months ago

Ok I think I stumbled over the solution for this issue: The graphviz-rust library requires additionally the graphviz CLI to be installed:

image Source: Documentation

On Windows these were the steps:

  1. winget install graphviz
  2. Add C:\Program Files\Graphviz\bin to PATH
  3. Validate the CLI is installed with dot -V
  4. Restart ballista scheduler