JuliaIO / BSON.jl

Other
158 stars 39 forks source link

[FR] Alter contents of a BSON file? #32

Open benninkrs opened 5 years ago

benninkrs commented 5 years ago

Nice work, BSON.jl is very useful.

Suppose one is running simulations that generate lots of data, perhaps more than can fit in RAM. In that case one would like to be able to save the data incrementally as it is generated.

Would it be feasible to add functionality to alter the contents of a BSON file? For example,

  1. append a variable and its value to a file, or
  2. change the value of a top-level variable, or
  3. change the value of a nested variable

Based on my limited understand of the BSON format, I imagine (1) might not be too hard, but (2) and (3) might be difficult since the storage needed for a variable in the middle of a file might change.

I could also imagine the answer might be, "In this case don't use BSON, use (some other storage approach)".

richiejp commented 5 years ago

Suppose one is running simulations that generate lots of data, perhaps more than can fit in RAM. In that case one would like to be able to save the data incrementally as it is generated.

Usually the solution to this is to allocate your objects within a memory-mapped file and let the Kernel handle swapping to disk. I think there may be a Julia library which does this. At any rate I don't see a reason to serialize to BSON format.

Based on my limited understand of the BSON format, I imagine (1) might not be too hard, but (2) and (3) might be difficult since the storage needed for a variable in the middle of a file might change.

Yep, you can change data so long as you don't change the size and you can append data to the end. Any other changes fall outside BSON's scope. Only appending data to the end of the file is really practical though.

I could also imagine the answer might be, "In this case don't use BSON, use (some other storage approach)".

I think BSON is fine, but you will have to combine it with another storage layer. Partitioning it into chunks which are small enough to be totally rewritten after an update. Currently I use it with Redis which is OK. I wouldn't recommend anyone try implementing such a layer within this library itself.