datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.67k stars 2.86k forks source link

Can't get provenance data from Apache NiFi API starting with version 1.15.0 #7174

Closed skannengiesser closed 1 year ago

skannengiesser commented 1 year ago

Describe the bug Ingesting with NiFi plugin as described here, returns 403 on POSTing to NiFi's Provenance API endpoint. Root cause seems to be that starting with NiFi 1.15.0, a random token is being exchanged between NiFi and the client (here Datahub) in form of a second Cookie upon calling the /access/token endpoint. For all subsequent PUT and POST requests the content of that Cookie (currently called __Secure-Request-Token) has to be placed in a custom HTTP header called Request-Token. Using NiFi UI this happens from NiFis Javascript code. The appropriate part seems to be missing in Datahub. Please see additional context below for further information.

To Reproduce Steps to reproduce the behavior:

  1. Configure a NiFi ingestion (Nifi >= 1.15.0, in use here: Datahub 0.95.0 and NiFi 1.18.0) according to Datahub's docs
  2. Run ingestion
  3. Find 403 errors in calls to provenance api endpoint

Expected behavior Provenance data gets ingested by Datahub.

Additional context Please see NiFis docs about the authorization workflow here. To my understanding the fix has to go here. The custom Request-Token header has to be set there. The reproduction of the 403 and its resolution by adding the header was verified here using Postman.

Please don't hesitate to ask for further information if needed.

I also can provide a PR if desired. Though I have to setup a Datahub dev environment first that I currently do not have.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

skannengiesser commented 1 year ago

The bug report is still relevant. The Request-Token header as requested by NiFi does still not get set in HEAD of DataHub code.

Verified bug with DataHub 0.9.5 and Apache NiFi 1.19.0. Will soon be able to test with more recent versions. But codebase of DataHub has not yet changed in relevant part and NiFi still documents this requirement to fulfill their CSRF protection strategy as written here: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#csrf-protection

As stated before, I'd go ahead and provide a patch for this but would like to get confirmation of a maintainer in advance. Thank you.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 30 days since being marked as stale.