aseldawy / spatialhadoop2

The second generation of SpatialHadoop that ships as an extension
Other
153 stars 98 forks source link

Please analyse the function packInRectangles in Repartition.java #8

Open ChenZhongPu opened 9 years ago

ChenZhongPu commented 9 years ago

I am reading the code of src/edu/umn/cs/spatialHadoop/operations/Repartition.java, which is to build index.

Anyone can help to analyse the following function:

public static CellInfo[] packInRectangles(Path[] files,
      Path outFile, OperationsParams params, Rectangle fileMBR)
      throws IOException 

I have no idea of what word it does.

aseldawy commented 9 years ago

This function computes a set of rectangles that can be used to partition the input files. These rectangles (or cells) are supposed to balance the load such that each cell is assigned, roughly, an equal number of records. The way it works is that it reads a random sample from the input file, bulk loads it into an in-memory R-tree using the sort-tile-recursive algorithm (STR), and returns the boundaries of leaf nodes of that R-tree. You can find more details in SpatialHadoop paper [http://spatialhadoop.cs.umn.edu/publications/ICDE15_industrial_522.pdf] Section V.B.