dataflint / spark

Performance Observability for Apache Spark
Apache License 2.0
198 stars 21 forks source link

Cannot access dataflint UI through Knox gateway URL using the #20

Open ef1236 opened 3 weeks ago

ef1236 commented 3 weeks ago

The bug is the same as described in [https://github.com/dataflint/spark/issues/13](This bug report)

When I try to access data flint through a knox gateway, it tries to access without the gateway

GET https:///api/v1/applications/<application_id/1/environment 404 (Not Found)

it should be call to: https:///gateway//sparkhistory/history//1/environment/

It looks like version 0.2.5 should have fixed it but I still encounter the same problem

menishmueli commented 3 weeks ago

@DanielAronovich please take a look, isn't it the same path you simulated with a proxy 1:1?

DanielAronovich commented 3 weeks ago

@ef1236, you are corret and it should have been solved.

I am on it.

Does the history server works well and the dataflint tab works well, but only when pressing "TO HISTORY SERVER" it breaks?

Does the "To spark UI" button on top of it works well?

Thanks!

ef1236 commented 3 weeks ago

@DanielAronovich Hey thanks for the quick reply

We are using Knox With Cloudera. The history server dataflint tab only works when I don't go through the Knox gateway.

When I access the sparkhistory server directly using the hostname and port: Hostname:18489/history/application/dataflint it queries for the environment endpoint correctly: Hostname:18489/api/v1/applications/application/environment

But when I access through the sparkhistory server, the URL is correct but the page tries to query for the environment endpoint with the wrong URL. When I go through the Knox gateway - exapmle.example/gateway/cdp-proxy/spark3history/history/application_/dataflint, It queries exapmle.example/api/v1/applications/application/environment instead of exapmle.example/gateway/cdp-proxy/spark3history/api/v1/applications/application/environment which would have worked (I queried it myself).

menishmueli commented 2 weeks ago

Wrote a fix and released version 0.2.6. I believe it will work this time You can look at the fix here: https://github.com/dataflint/spark/commit/4786a1b09f63d986ee963c10b92434b4990e4866

@ef1236 let me know if version 0.2.6 fixed the problem so we can close this issue

ef1236 commented 2 weeks ago

Hey, 0.2.6 did not fix the problem. But I think I found the problem in the code In spark-ui/src/utils/UrlUtils.ts in Line 38 the pathToRemove variable should remove the ".*" at the start because then the regex captures the whole start of the URL and removes also the gateway

// old
const pathToRemove = /.*\/history\/[^/]+\/dataflint\/?$/;
// new
const pathToRemove = /\/history\/[^/]+\/dataflint\/?$/;

Then it only captures the part from the /history and not the whole pathname

I Created a pull request with the change

ef1236 commented 1 week ago

@menishmueli made a commit that should fix the problem https://github.com/dataflint/spark/pull/23#issuecomment-2482378622 I will update the issue when a new version is released and i can check if the issue is really solved