frictionlessdata / frictionless-js

A lightweight, standardized library accessing files and datasets, especially tabular ones (CSV, Excel).
https://frictionlessdata.io
70 stars 8 forks source link

Allow passing authentication headers when loading URLs #130

Open adamhooper opened 2 years ago

adamhooper commented 2 years ago

I'm building ... well ... a data hub. Some clients will publish private Frictionless datasets, and they'll want tools to read them.

I see no docs on frictionlessdata.io concerning this pattern. So as a brainstorm idea, perhaps let the user define an HTTP transport:

const data = require('frictionless.js')

async function customFetch(options) {
    return http.request({ ...options, headers: { 'Authentication': 'Basic abcdef' } })
}

const pathOrDescriptor = 'https://paid-service.com/all-my-secrets/datapackage.json'
const dataset = await data.Dataset.load(pathOrDescriptor, { customFetch })

Prior work: Here are some approaches for supporting a zillion HTTP servers:

As a user, I find the Relay approach the most intuitive.

(These clients are all browser-native, so the use cases are a bit different.)

rufuspollock commented 2 years ago

@adamhooper hi, and great to hear from you. I'm one of the creators of Frictionless. Very interested in this pattern. Obviously retrieving a dataset is relatively straightforward but things like integrating with a local graphql layer could be very interesting.

Please do share more thoughts and I can share a bit of background at our end.

At Datopian we've been building DataHubs built directly on Frictionless for some time e.g. https://datahub.io/ is backed purely by Frictionless Data Packages e.g. do an HTTP GET on https://datahub.io/core/finance-vix/datapackage.json.

Recently we've been working on things like this:

We've also built portals purely built on Frictionless datasets stored in Github etc.

/cc @risenW

adamhooper commented 2 years ago

@rufuspollock Hiya! Good to hear from you. I think Frictionless Data is awesome.

I'm CTO of Workbench. Right now I'm building a feature that lets users click "Publish" to create a dataset they can access with any Frictionless-compatible client. Once that's up, we'll build our own Workbench+Frictionless importer so our users can actually use the datasets they publish :). Our main motivation is in helping people publish from Workbench, for Workbench. Frictionless brings bonus value: users can switch tools at will.

We think many users want to publish non-public datasets. For instance, a newsroom might prototype a web app using a Frictionless dataset. The web app needs to access the (Frictionless) dataset, but the newsroom mustn't make the dataset public before the story is ready.

I think HTTP auth and private SSL certs can happen at each Frictionless client's transport layer, so the Frictionless specs needn't be altered. Do you agree?

(This is why I compared with GraphQL: GraphQL's specs don't mention authentication, and all clients give a hook to permit it. I think that's a great model for Frictionless to emulate.)

rufuspollock commented 2 years ago

I think HTTP auth and private SSL certs can happen at each Frictionless client's transport layer, so the Frictionless specs needn't be altered. Do you agree?

👍 100%. I think nothing needs changing in specs.

When you do get this working could be nice to share some code snippets so that others can learn from you (and us).

Also great to see Workbench - looks super cool and delighted to hear Frictionless and the Frictionless libs have been valuable to you!