Closed max-mapper closed 10 years ago
So like a write single stream, also we could have read do multiple read streams On Feb 16, 2014 3:33 PM, "Max Ogden" notifications@github.com wrote:
I think we're gonna need to rethink the hashing implementation. What I wanna do is stream a file in with limit: Infinity, e.g. only create 1 file in the blob store regardless of how big the incoming file is
The way it works right now is that every chunk gets stored as a separate file. To change this we would need to rework the relationship between WriteCabs and Cabs.prototype.write so that they can stream entire files, and update the hash in a streaming way (e.g. https://github.com/dominictarr/content-addressable-store/blob/master/index.js#L7-L16 )
Reply to this email directly or view it on GitHubhttps://github.com/calvinmetcalf/cabs/issues/8 .
oh yea good point.
so if limit
is Infinity
:
if limit
is not Infinity
:
So for reading we can just always open a read stream with a stream of chunks like now we can just a have concatenated steams.
For writing we can have chucked mode and file mode, chunked mode is what we have now, we can improve it latter, file mode buffers to a temp file and incrementally calculates the hash. On Feb 16, 2014 5:51 PM, "Max Ogden" notifications@github.com wrote:
oh yea good point.
so if limitis Infinity:
- write entire input stream into one file, bypassing byte-stream etc
- by default it should write to a temporary file both for consistency and also because we don't know the hash ahead of time
- for reading it is simpler because you just have to read one file
if limit is not Infinity:
- I think instead of buffering using byte-stream here we should use fs.createWriteStream and stop when we hit the limit, then open another write stream until the input stream empties
- maybe we need an 'approximate' setting that will write the entire last chunk to the current fs.createWriteStream stream, e.g. if the limit is 5mb and a 500kb chunk comes in when we've written 4.9MB to the current file already, if 'approximate' is true we should just write the whole 500kb chunk, otherwise if false we should slice the 500kb down to 100kb so that the file is exactly 5mb. this cuts down on unnecessary buffer slices. default on this should probably be false
Reply to this email directly or view it on GitHubhttps://github.com/calvinmetcalf/cabs/issues/8#issuecomment-35219013 .
ok so read now streams each of the files that it reads and I added a writeFile method which does the whole buffer to disk thing, I can probably rewrite writeStream in terms of that instead of write
writeStream now uses writeFile internally so the whole chunk is now no longer buffered in memory but in a file, meaning a chunk size in the gigabytes should be (theoretically) possible.
closing this as we've implemented all of it except for approximate support
I think we're gonna need to rethink the hashing implementation. What I wanna do is stream a file in with
limit: Infinity
, e.g. only create 1 file in the blob store regardless of how big the incoming file isThe way it works right now is that every chunk gets stored as a separate file. To change this we would need to rework the relationship between
WriteCabs
andCabs.prototype.write
so that they can stream entire files, and update the hash in a streaming way (e.g. https://github.com/dominictarr/content-addressable-store/blob/master/index.js#L7-L16)