CompSciCabal / SMRTYPRTY

We read computer science books for fun. This is where the secret notes live.
The Unlicense
77 stars 10 forks source link

Semi-Indexing Semi-Structured Data in Tiny Space #62

Open pbevin opened 7 years ago

pbevin commented 7 years ago

http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf

Imagine you have a collection of large JSON or XML documents, and you want to run queries over them that just grab a small subset of the data. You don't want to fully parse each document for each query, so you should index the documents somehow, but they all have different tree structures, and it's not clear what an index would look like. This paper shows how to augment each document with a small amount of data that makes it very fast to search inside them.