Intel-HLS / GenomicsDB

GenomicsDB
Other
111 stars 28 forks source link

changes for spark locality #145

Closed mlathara closed 6 years ago

mlathara commented 6 years ago

This change associates an inputsplit with the node name in the hostfile. Combining this with assigning a very high value for spark.locality.wait ensures that when spark is scheduling tasks it will respect data locality and query all the nodes in a setup where the genomicsdb array has been partitioned across multiple nodes.