Closed anuveyatsu closed 7 years ago
@anuveyatsu sounds useful to accept a stream. Can you do a PR for us to consider/review?
@anuveyatsu
source (String/Array[]/Function) - data source (one of):
- local CSV file (path)
- remote CSV file (url)
- array of arrays representing the rows
- function returning readable stream with CSV file contents
Have you tried to pass a stream constructor to the Table
class?
const source = () => // create your stream
const table = await Table.load(source)
@roll I will try this approach and will update here. @pwalsh if necessary, I can do a PR.
@anuveyatsu
Please re-open if needed. Table
accepts a stream constructor because AFAIK Node.js streams are not rewindable by default but Table
needs an ability to read it more than one time.
@roll whilst you can't rewind a stream I think it is standard practice to duplicate node streams by using a passthrough stream e.g.
var fs = require('fs')
var stream = require('stream')
var contents = fs.createReadStream('./bigfile') // greater than 65KB
var stream1 = contents.pipe(new stream.PassThrough())
var stream2 = contents.pipe(new stream.PassThrough())
stream1.on('data', function (data) { console.log('s1', data.length) })
stream1.on('end', function () {
stream2.on('data', function (data) { console.log('s2', data.length) })
})
I get that you can move responsibility onto clients for doing this (as you have done) but you can do it internally.
Note also here: we don't really want the Table object per se - we just want to use infer 😄 -- this relates to the API discussion we've been having. Basically what would be perfect IMO is a simple method like:
const infer(stream) => schema
This is clean and simple purpose and could even be its own mini library (this makes it much easier for others to reuse and contribute to ...) - cf the discussion on the FD channel about these algorithms.
@rufuspollock
I've created a feature request for single stream support - https://github.com/frictionlessdata/tableschema-js/issues/95. Still not sure it's super critical because this functionality (stream related) is more for system integrators who is able to handle it on client side. And ordinary users pass file paths usually not streams. But if it will not complicate the Table
class too much it's worth to try I think.
Note also here: we don't really want the Table object per se - we just want to use infer :smile: -- this relates to the API discussion we've been having. Basically what would be perfect IMO is a simple method like:
const infer(stream) => schema
I think it's readme issue - in tableschema@1.0
you could pass to infer
anything that Table
class accepts:
const descriptor = await infer(array)
const descriptor = await infer('data.csv')
const descriptor = await infer('http://example.com/data.csv')
const descriptor = await infer(streamConstructor)
const descriptor = await infer(stream) // if we implement https://github.com/frictionlessdata/tableschema-js/issues/95
I noticed that we cannot pass a row stream as a source. It would try to create a row stream again and error - https://github.com/frictionlessdata/tableschema-js/blob/master/src/table.js#L220-L253
Is there a workaround for this case?