TIBCOSoftware / snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
http://www.snappydata.io
Other
1.04k stars 201 forks source link

snappydata on mesos #339

Open thbeh opened 8 years ago

thbeh commented 8 years ago

Hi, I have managed to create snappydata cluster with docker but i really don't have any ideas how it would run in mesos (I am still new in mesos). Any thoughts?

Is it possible to have snappydata works with existing spark deployment in Mesos (specifically dcos)?

sumwale commented 8 years ago

@thbeh SnappyData will work with existing Spark on Mesos (or any other) in split cluster mode. See docs here: http://snappydatainc.github.io/snappydata/deployment/

In this mode, the mesos cluster will talk to snappydata like an external datastore. You can access the SnappyData APIs using maven deps (snappy-core), or as spark package (https://spark-packages.org/package/SnappyDataInc/snappydata). The only additional property you need is "snappydata.store.locators" (or "spark.snappydata.store.locators" for spark-submit/shell that filter out only "spark." properties) as noted in the doc. With this, you can create column/row tables and other entities using SnappyContext as in docs which will be stored in snappy cluster (with the meta-data).

For best performance, you should consider starting snappydata on exactly the same nodes as mesos and snappydata will try hard to ensure that data is stored/retrieved from only that node avoiding network transfers as much as possible.

thbeh commented 8 years ago

So what @sumedh saying is that

  1. I can have a cluster of snappydata (in Docker, VMs, etc) not manage by mesos (but within mesos cluster) and use spark/zeppelin to interact with snappydata store like I would with Hive or cassandra
  2. If I have multiple locator, would I need to put as set( "snappydata.store.locators", "localhost1:10334","localhost2:10334",etc)
  3. Can I define 'snappystore.store.locators' in spark interpreter configuration in zeppelin
  4. Using the split cluster mode, will I have a performance penalty as it is not running collocated in the same JVM as spark, like in Snappydata default Unified cluster mode
  5. In your 'For best performance statement...' which of the components (locator, lead or server) of snappydata that I should be starting with mesos and by doing so, I would need to avoid defining multiple snappydata.store.locators in 2, am I correct?

Sorry for the long questions. Thanks in advance

Regards Beh

On Wed, Aug 24, 2016 at 7:26 PM, Sumedh Wale notifications@github.com wrote:

@thbeh https://github.com/thbeh SnappyData will work with existing Spark on Mesos (or any other) in split cluster mode. See docs here: http://snappydatainc.github.io/snappydata/deployment/

In this mode, the mesos cluster will talk to snappydata like an external datastore. You can access the SnappyData APIs using maven deps (snappy-core), or as spark package (https://spark-packages.org/ package/SnappyDataInc/snappydata). The only additional property you need is "snappydata.store.locators" (or "spark.snappydata.store.locators" for spark-submit/shell that filter out only "spark." properties) as noted in the doc. With this, you can create column/row tables and other entities using SnappyContext as in docs which will be stored in snappy cluster (with the meta-data).

For best performance, you should consider starting snappydata on exactly the same nodes as mesos and snappydata will try hard to ensure that data is stored/retrieved from only that node avoiding network hit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/339#issuecomment-241979472, or mute the thread https://github.com/notifications/unsubscribe-auth/AHBFho1_jUq6-afqjB7qL6hKV9_-2uUqks5qi_ISgaJpZM4JrVTB .