dmdedup / dmdedup3.19

Device-mapper Deduplication Target
20 stars 11 forks source link

Source code about the DTD backend. #37

Closed Oliver-Luo closed 7 years ago

Oliver-Luo commented 7 years ago

Hi there,

There are three backends presented in the paper: inram, dtd and cowbtree. But there are only two of them here on the Github. Is it possible to also provide the source code of dtd backend so that we could benchmark them together? Even some clue about the implementation would help. Thanks.

Oliver-Luo commented 7 years ago

I don't know whether it's proper to ask here, but hope I can get some help:

Also I'v noticed there is a paper in FAST 16 called "Using Hints to Improve Inline Block-layer Deduplication" and is implemented based on dmdedup and share some authors. Is it possible to find the source code of that project or it is not open sourced?

sectorsize512 commented 7 years ago

Hi,

I've sent the author of DTD backend and Hints paper an e-mail asking to reply to this thread. Yes, it is the same person! :) Let's wait for her detailed reply.

To limit our work, we picked two backends to maintain officially here: CBT because it provides an adequate level of consistency (in case of a sudden power loss) and INRAM because it is simple and helps in basic benchmarking.

Vasily

Oliver-Luo commented 7 years ago

Hi Vasily,

Thanks for your reply. Waiting for it~

sonamdp42 commented 7 years ago

Hi,

I'm attaching a set of patches that implement the disktable backend for Linux version 3.17.rc2. You can use unzip. I cannot guarantee that this is indeed the cleanest and latest code though. So if you face problems, feel free to reach out to me. You can apply the patches in order based on the number.

As for the implementation. We have used a very similar layout as the INRAM backend, where we create tables to store the various metadata information. A superblock exists, which contains information about the starting of each metadata table. Two types of hash tables exist, linear and sparse. The linear hash table is used to store a mapping from lbn->pbn, whereas the sparse hash table is used to store information about the hashes (lbn->hash). We also maintain refcounts for the blocks in a separate table.

I've also added patches for the hints code. Most of it is to percolate the flags for NODEDUP and PREFETCH from upper layers to the device mapper layer, and then use it internally in dmdedup as hints.

Let me know if you have any questions.

hints_disktable_patches.zip

sectorsize512 commented 7 years ago

Thanks, Sonam!

Oliver-Luo commented 7 years ago

Thanks for your reply~