connormanning / entwine

Entwine - point cloud organization for massive datasets
https://entwine.io
Other
441 stars 129 forks source link

Discussion: How to merge-in or incrementally add data? #316

Closed jlaura closed 10 months ago

jlaura commented 10 months ago

I am not 100% sure how to go about incrementally adding data to an EPT data set. I see merge in the docs and that this should not be used to merge un-related EPT 'files'.

If both EPT's are made with data for same body, in the same SRS, are they related and I should use merge? Or is there another mechanism to incrementally add data (maybe it's as simple as entwine build and the an s3 sync (though this seems wasteful?).

Thanks for any suggestions.

connormanning commented 10 months ago

You can use the same output for multiple builds, with some important limitations:

merge is never applicable in this use-case, it is only applicable when subsets have been created from the same data in multiple builds.

Files should also be de-duplicated by their filename within Entwine. So this common use-case should work:

entwine build -i my-data/ -o output

Then, let's say a new file is added to my-data/. Now, running this exact same command should only add the new file.

If anything described here does not work, feel free to open an issue with specifics.