Keep track of the papers I have read and to be read
Three pass reading : How to read a paper
More detailed methods : skills about reading papers
PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
Stage : 1/3
Description : Introduce guard structure to reduce write amplification.
SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores.
Stage : 1/3
Description : Introduce a new I/O Scheduler to reduce write tail latency.
WiscKey: Separating Keys from Values in SSD-Conscious Storage
Stage : 1/3
Description : K-V separation in LSM-tree to reduce amplification .
else : An article from cxs introduces this tech used in real products
Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines
Stage : 1/3
Description : Introduce offline learned modle and online inferrence to get the hot/cold range then get the related blocks.
Revisiting Data Prefetching for Database Systems with Machine Learning Techniques
Stage : 1/3
Description : Devise a Multi-Model framework depends neural network to optimize random accesses by transactions.
MyRocks: LSM-Tree Database Storage Engine Serving Facebook's Social Graph
Stage : 0/3
Description : TODO.
else : A good article explains this
LSM-based Storage Techniques: A Survey
Stage : 1/3
Description : A survey about LSM-tree.
HTAP Databases: What is New and What is Next
Stage : 1/3
Description : Introduce the current HTAP DB and its techniques.Also introduce a general benchmark to measure the performance.At last, talk about the problems and opportunities.
else : A good talk by the author. An article that summarizes this
CloudJump: Optimizing Cloud Databases for Cloud Storages
Stage : 0/3
Description : Introduce the challenges when traditinal DB storage switchs to cloud storage. Then propose a frame named CloudJump and apply it to polarDB(B+ tree) and RocksDB(LSM-tree) to show the performance promotion.
else : A good article by the author. An article about cloud storage
DESIGN AND IMPLEMENTATION OF THE SUN NETWORK FILESYSTEM
Stage : 1/3
Description : v2/v3 adopt stateless which means it's up to client to save the state of each operation(file handler), the drawback of this way lays on that when one client remove a file then the server have no idea whether there are other clients using the removed file , so when the clients which hold the fh of the removed file want to operate on the file, there will be an error, the solution is to add a variabe called generation to record the version, when operate on a out-of-date file ,there will be a warning which will not cause a security problem. v4 adopt stateful method which is not mentioned in this paper.
client : In the OS kernel, there is a client which is responsible for the RPC , and in v2/v3 it will get file handler(fh) to record the state.
server: add VFS and Vnode interface, more general.
else :
Stage : 1/3
Description : large files ,read more ,write less ,modify less ,append more. Generally, a file has three replics distributed in different chunkservers(for hotter file , the replicas will be more)
Data flow :
client ---> chunkserver1 ---> chunkserver2 ---> chunkserver3 ....
Control flow : client ---> primary ---> other chunkservers ---> when all finish primary return finish to client
**else** : HDFS is the open-source version of GFS.
Cloud Data Warehousing: Snowflake and Beyond
Description : Talk about the details about Snowflake and introduce some exciting ideas about cloud DB research.