Closed nedbaldessin closed 5 years ago
And an obvious one: Amazon Glacier.
C14 has an SFTP interface (which I've used) so likely works already.
OVH Cloud Archive has an SFTP interface too by the look of it (which I haven't tried).
Glad to see the developer of rclone is here too! Thank you for the information.
Can anyone confirm if Duplicacy works with cold storages? (Glacier, C14, OVH Cloud Archive, etc.)
Since nobody seems to have tried it, maybe you can give it a try and report back? Any service that has SFTP access should work in principle. The only problem might be that the reading of directories and files, which duplicacy frequently does might be too slow for proper operation and/or that these operations cost too much money. In addition, pruning might make things worse since the idea of cold storage is to leave the data alone once they're uploaded. That may not work well with fossil collection and fossil deletion.
Hello,
Being also interested in using OVH's cold cloud storage with Duplicacy, I did my own research (on OVH's service and on cold storage in general), and here is what I found out.
As a reminder and for clarity's sake, “cold” storage schemes provide long-term storage of data which have no need to be frequently/rapidly accessed. While the actual technologies differ, cold storage usually implies using cheaper and more compact technologies, or technologies which do not need to networked and running 100% of the time, thus reducing costs, at the price of higher costs/latency for every data retrieve procedure.
Several cloud storage providers have started offering cold storage tiers. These include include Amazon Glacier, OVH Public Cloud Archive, Google Coldline, Microsoft Cool Blob Storage and Online C14. For almost all of these vendors, cold storage is offered as adjacent to their mainline cloud storage tier. Some of these vendors in fact offer the possibility of programming rules so that older data located in hot storage containers may be automatically transferred to cold storage containers at some point in their lifetime.
In practice, all of these vendors rely on different technologies, and provide very different interfaces to upload and retrieve data to and from the storage. In addition, their actual pricing schemes tend to differ significantly from each other. All this to say, it's probably hard —if not outright impossible— to provide universal interfaces for cold cloud storage providers.
OVH's Public Cloud Archive (PCA) definitely stands out of the crowd, though.
Like their regular Object Storage service, PCA relies on OpenStack Swift, which is an open protocol and that Duplicacy supports since version 2.1.0 (March 2018). As noted in above messages, they also provide an SFTP bridge, which is also usable by Duplicacy.
As such, it's probably one of the only cold storage solutions that Duplicacy may possibly use out of the box.
I'll focus on OVH PCA in the remainder of this summary for these reasons!
Useful links:
The upload of data is priced, the download of data is priced (at double the upload rate), the monthly storage of data is also priced (at a very low rate).
The upload of data is straightforward. When using the OpenStack Swift protocol, you simply need to create an object inside a container. When using SFTP, you simply need to upload files.
The tricky part is the retrieval of data. Since this is a cold storage service, you first have to “unfreeze” an object that you want to download. This can be performed either:
Once unfreezing has begun, it takes some time for the object to be made available for download. The first time I requested unfreezing on a file, it took 4 hours.
The remaining time until unfreezing is visible in the admin panel. When you request a download (and thus unfreezing) through OpenStack Swift, it's also provided in a header of the response (the one with error code 429).
Once unfrozen, an object remains available for download for 24 hours (per the developer guide), before becoming frozen again.
Contrary to other cloud services, OVH PCA does not make you pay anything for unfreezing an object (just for actually downloading, as explained above).
However, per the developer guide, it is “designed for seldom consulted data: the less frequently an archive unsealing operation is requested, the smaller the retrieval latency”.
config
file. Since Duplicacy has a databaseless approach, it downloads the config file from the storage every time, even to perform backups. This is problematic since the config file needs to be unfrozen prior to download.In other words, Duplicacy works almost well for uploading data, the only roadblock is the download of the config file (and potentially metadata chunks, I sure hope I'm not wrong!!).
For any recovering and pruning, Duplicacy will be pretty inefficient, and will need to wait for unfreezing.
So technically, Duplicacy is already usable with OVH PCA, provided we connect the storage through OpenStack Swift. Duplicacy will automatically trigger unfreezing of the files it needs by attempting to download them (it could also be triggered manually), but a lot of waiting time will be involved before actually backing up or restoring.
Coupled with the fact that frequent unfreezing requests may actually lengthen this delay… this irrevocably makes restoring data from OVH Public Cloud Archive very inefficient.
This inadequacy for data restoration is probably in line with the purpose of cold storage in the first place?
I mean that you should probably not attempt to make cold storage your “main” storage, since data retrieval from it will always be slow and costly.
On the other hand, since Duplicacy supports multiple storages, it could be feasible to back up to and restore from a primary storage, and to simply back up to PCA as a secondary storage without ever restoring from it. This would grant additional long-term preservation of data, without the cost of recovering from PCA.
The only problem with that is again the download of the config file, which requires prior unfreezing prior to backups.
A potential counter to this config download problem is to manually duplicate the chunks located in the primary storage to the PCA storage (provided the PCA storage is made copy-compatible and bit-identical with the primary storage).
In fact, rclone could be used to duplicate these chunks; since it doesn't need to download a config file, it will be able to upload data to the storage easily.
In addition, since version 1.47.0 (released just 10 days ago), rclone is compatible with OVH PCA: it can now download from PCA by waiting for the appropriate unfreezing time (see issue #341 of rclone).
Both of these tools have obvious downsides to Duplicacy, but I thought I'd list those which can potentially deal with PCA anyway.
That's it!
I hope I did not write any factual errors —please feel free to point them out if that is the case.
In short, I don't think cold cloud storage will ever be a sound choice as primary storage for Duplicacy. Hopefully, it will eventually be possible to use one as a secondary, write-only storage (and in that case, OVH Public Cloud Archive will most likely be the easiest one to use).
As for myself, I'm still considering whether to just use OVH Object Storage instead, or to incorporate rclone into my backup workflow to manually transfer chunks to PCA from a local storage.
Alternatively, and if it isn't conceptually unfeasible, I'd like to see if I could write such a metastorage.
@Rastagong thank you for the lengthy and factual comment (yes, i did read it all). One thing i cannot comment on is whether duplicacy will be able to support OVH and/or when.
One thing i can ask is that you take the answer and move it to the Duplicacy Forum as a feature request. I think people there may be interested in this topic.
I have created a special tag for this type of issues: https://forum.duplicacy.com/tags/cold-storage-archival
I would also like to decrease our usage of github for feature requests (and even for bug reports if that were possible) and use the forum instead, since it is more likely that people will use a forum instead of the github issue tracker.
@gilbertchen may i suggest if/when/after @Rastagong moves his post to the forum we close this issue and link it to the forum post and the forum post here (just in case)?
No problem, just registered on the forum and cross-posted my message there! I also took the opportunity to edit a few things (and to correct a mistake regarding pricing, woops).
Sorry for cluttering the GitHub repository with this, it's definitely for the best to use the forum for discussions like these.
Thanks for moving!
@gilbertchen could you please close this issue? (1 at a time is still progress!)
I'm closing this issue as the discussion has moved over to https://forum.duplicacy.com/t/cold-storage-compatibility-with-ovh-cloud-public-archive/2001
So called "cold storage" offers are really cheap these days. For example: Online C14 — 0.002€ / GB / Month OVH Cloud Archive (in french, sorry)— 0.002€ / GB / Month
These services imply that have to deal with delayed reads and writes.
Would it be possible for Duplicacy to adopt these services?