TIBCOSoftware / snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
http://www.snappydata.io
Other
1.04k stars 200 forks source link

spark api #296

Closed thbeh closed 7 years ago

thbeh commented 8 years ago

Correct me if I am wrong, from snappydata docs, it says that I should be able to access snappydata from external spark deployment by using snappycontext, am I correct? So can I do the following in a spark-shell -

val conf = new org.apache.spark.SparkConf() .setAppName("mySnappyApp") .setMaster("local[*]") .set("jobserver.enabled", "true") .set("snappydata.store.locators", "localhost:10334") .set("spark.ui.port", "4042") .set("spark.driver.extraLibraryPath", "/home/thbeh/snappydata-0.4.0-PREVIEW-bin/lib") .set("spark.driver.allowMultipleContexts","true")

val sc = new org.apache.spark.SparkContext(conf) val sqlContext = new org.apache.spark.sql.SnappyContext(sc) val airline = sqlContext.table("airline").show

.....but I got this error......

_conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@59b778dc sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6f25a644

:31: error: type SnappyContext is not a member of package org.apache.spark.sql val sqlContext = new org.apache.spark.sql.SnappyContext(sc)_ I trying to use spark interpreter in Zeppelin. Please advice
rishitesh commented 8 years ago

As far as I know Zepelin creates a SparkContext implicitly. You can put Snappy libraries in spark dependency folder in Zeppelin.

SachinJanani commented 8 years ago

@thbeh If you are using zeppelin 0.5.6 then replace zeppelin-0.5.6-incubating-bin-all/interpreter/spark/dep/zeppelin-spark-dependencies-0.5.6-incubating.jar with snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar this is because zeppelin is embedded with Spark 1.5 and snappydata-0.4 supports spark 1.6.Please let me know if you face any issues

thbeh commented 8 years ago

Hi, I am using v0.6.0 of Zeppelin, do I need to do the same?

Regards Beh On 5/07/2016 6:03 PM, "SachinJanani" notifications@github.com wrote:

@thbeh https://github.com/thbeh If you are using zeppelin 0.5.6 then replace zeppelin-0.5.6-incubating-bin-all/interpreter/spark/dep/zeppelin-spark-dependencies-0.5.6-incubating.jar with snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar this is because zeppelin is embedded with Spark 1.5 and snappydata-0.4 supports spark 1.6.Please let me know if you face any issues

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230393964, or mute the thread https://github.com/notifications/unsubscribe/AHBFhv5e2Oxren0Xguu_Z5353FecVvEKks5qSfO6gaJpZM4JEt7K .

SachinJanani commented 8 years ago

For zeppelin 0.6 you have to only copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar in <ZEPPELIN_HOME>/interpreter/spark/dep/

thbeh commented 8 years ago

So I copied jar as Sachin instructed and doesn't make sense, what did I missed?

[image: Inline image 2]

On Tue, Jul 5, 2016 at 9:04 PM, SachinJanani notifications@github.com wrote:

For zeppelin 0.6 you have to only copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar in

/interpreter/spark/dep/ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230425141, or mute the thread https://github.com/notifications/unsubscribe/AHBFhnly4oezLkaEyKg5cs6vwJldUTzrks5qSh4lgaJpZM4JEt7K .
hbhanawat commented 8 years ago

@thbeh Can't see the attachment.

thbeh commented 8 years ago

Re sent image as attachment

[image: Inline image 1]

On Wed, Jul 6, 2016 at 10:51 AM, hbhanawat notifications@github.com wrote:

@thbeh https://github.com/thbeh Can't see the attachment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230625844, or mute the thread https://github.com/notifications/unsubscribe/AHBFhk3766CzE3t-Qej-5wOomzXJptMpks5qSuAAgaJpZM4JEt7K .

hbhanawat commented 8 years ago

@thbeh Still can't see it. Looked at the github page as well but is not visible there too. https://github.com/SnappyDataInc/snappydata/issues/296

thbeh commented 8 years ago

98b0399d-9e72-49e6-a5b3-df3531c64548

thbeh commented 8 years ago

You should be able to see it now.

One very funny thing is that I have to set 'zeppelin.spark.useHiveContext to false' before I could see snappy store. But %sql still complaining about table does not exist.

On Wed, Jul 6, 2016 at 11:18 AM, hbhanawat notifications@github.com wrote:

@thbeh https://github.com/thbeh Still can't see it. Looked at the github page as well but is not visible there too. #296 https://github.com/SnappyDataInc/snappydata/issues/296

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230630954, or mute the thread https://github.com/notifications/unsubscribe/AHBFhn-eOvpm6mR6vRX_vqUrMXcSf8TAks5qSuZegaJpZM4JEt7K .

hbhanawat commented 8 years ago

What is happening here is that you have registered airlines table as a temp table in SnappyContext, which is a specialized SQLContext. and you are querying it using %sql which internally uses a SQLContext. Temp tables are not visible across contexts.

You can do two things here.

  1. You can make your query run by firing sqlContext.sql("your query here"). sqlContext is the SnappyContext that you have created.
  2. You can use zeppelin server from our our forked Zeppelin repo so that you don't have to run the query using sqlContext.sql. You can then use %snappy-sql to fire the queries directly.

With 0.6 release of zeppellin they are allowing better integration for interpreters. We are working on adding an interpreter for Snappy.

thbeh commented 8 years ago

I reverted back to 0.5.6 but looks the same. I have attached a snapshot, in the same notebook, one datasource from snappydata and another from csv. The csv converted to DF->TempTable works fine but not the DF from snappy -

zeppelin_0 5 6

thbeh commented 8 years ago

I don't see a zeppelin repo in snappydatainc's github?

On Wed, Jul 6, 2016 at 2:36 PM, hbhanawat notifications@github.com wrote:

What is happening here is that you have registered airlines table as a temp table in SnappyContext, which is a specialized SQLContext. and you are querying it using %sql which internally uses a SQLContext. Temp tables are not visible across contexts.

You can do two things here.

  1. You can make your query run by firing sqlContext.sql("your query here"). sqlContext is the SnappyContext that you have created.
  2. You can use zeppelin server from our our forked Zeppelin repo so that you don't have to run the query using sqlContext.sql. You can then use %snappy-sql to fire the queries directly.

With 0.6 release of zeppellin they are allowing better integration for interpreters. We are working on adding an interpreter for Snappy.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230657095, or mute the thread https://github.com/notifications/unsubscribe/AHBFhtr76Av1Hh1-LO0SXD7RXXhcaLeBks5qSxSjgaJpZM4JEt7K .

jramnara commented 8 years ago

Same problem as reported by Hemant. 'temp' is scoped by the SnappyContext and 'Auction' is scoped by the spark context you created.

We haven't yet the Snappydata Zeppelin interpreter public, yet. @rishitesh, @SachinJanani can you guys make the latest branch with the Zeppelin interpreter and support for 0.6 zeppelin accessible? in fact, make the binary distribution for this branch available. I think that will resolve all issues reported here.

@thbeh can simply use '%snappy-sql' for his second para.

thbeh commented 8 years ago

Great, hopefully I can get my hands on it soon. Maybe someone can point me to the repo I could compile from source as I did for Zeppelin 0.6.0. Thanks

On Wed, Jul 6, 2016 at 4:48 PM, Jags Ramnarayan notifications@github.com wrote:

Same problem as reported by Hemant. 'temp' is scoped by the SnappyContext and 'Auction' is scoped by the spark context you created.

We haven't yet the Snappydata Zeppelin interpreter public, yet. @rishitesh https://github.com/rishitesh, @SachinJanani https://github.com/SachinJanani can you guys make the latest branch with the Zeppelin interpreter and support for 0.6 zeppelin accessible? in fact, make the binary distribution for this branch available. I think that will resolve all issues reported here.

@thbeh https://github.com/thbeh can simply use '%snappy-sql' for his second para.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230670251, or mute the thread https://github.com/notifications/unsubscribe/AHBFhrPT-u7HhJ2o72-5buOYrnEQNX6Qks5qSzOkgaJpZM4JEt7K .

SachinJanani commented 8 years ago

@thbeh We are about to create the binaries for the Zeppelin with snappydata interpreter but for now you can use the attached snappydata interpreter.Following are the steps that you will need in order to install snappydata interpreter in zeppelin 0.6: 1) Download attached snappydata-interpreter.tar.gz 2) Extract snappydata-interpreter.tar.gz and copy snappydatasql directory from snappydata-interpreter in <ZEPPELIN_HOME>/interpreter/ 3) Copy zeppelin-site.xml from extracted snappydata-interpreter to <ZEPPELIN_HOME>/conf/ 4) Copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar to <ZEPPELIN_HOME>/interpreter/snappydatasql directory 5) Restart zeppelin daemon 6) Verify whether snappydatasql interpreter appears in the interpreter list 7) Now you can use snappydatasql interpreter using %snappydatasql

Please let me know if you face any issues snappydata-interpreter.tar.gz

thbeh commented 8 years ago

the interpreter does not seem to work with my sourced compile of 0.6.0 and coincidentally Zeppelin 0.6.0 was released to day. Downloaded that and re copy snappydata intepreter to zeppellin and it works lie a charm.

Another thing I notice was that when a query is running and then cancelled, snappydata server kill itself.

On Wed, Jul 6, 2016 at 9:02 PM, SachinJanani notifications@github.com wrote:

@thbeh https://github.com/thbeh We are about to create the binaries for the Zeppelin with snappydata interpreter but for now you can use the attached snappydata interpreter.Following are the steps that you will need in order to install snappydata interpreter in zeppelin 0.6: 1) Download attached snappydata-interpreter.tar.gz 2) Extract snappydata-interpreter.tar.gz and copy snappydatasql directory from snappydata-interpreter in /interpreter/ 3) Copy zeppelin-site.xml from extracted snappydata-interpreter to

/conf/ 4) Copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar to /interpreter/snappydatasql directory 5) Restart zeppelin daemon 6) Verify whether snappydatasql interpreter appears in the interpreter list snappydata-interpreter.tar.gz https://github.com/SnappyDataInc/snappydata/files/349420/snappydata-interpreter.tar.gz 7) Now you can use snappydatasql interpreter using %snappydatasql Please let me know if you face any issues — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230717512, or mute the thread https://github.com/notifications/unsubscribe/AHBFhmCI1dbziDTqKiNNJseo9sz4mO3Mks5qS289gaJpZM4JEt7K .
SachinJanani commented 8 years ago

@thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

thbeh commented 8 years ago

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662, or mute the thread https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K .

thbeh commented 8 years ago

So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?

On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662, or mute the thread https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K .

jramnara commented 8 years ago

Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.

Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:

So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?

On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662 , or mute the thread < https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560, or mute the thread https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K .

thbeh commented 8 years ago

I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.

Any thoughts?

On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan notifications@github.com wrote:

Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.

Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:

So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?

On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani <notifications@github.com

wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662

,

or mute the thread <

https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560 , or mute the thread < https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824, or mute the thread https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K .

jramnara commented 8 years ago

I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do).

That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos? I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy?

When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db?


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 5:04 PM, T H Beh notifications@github.com wrote:

I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.

Any thoughts?

On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan <notifications@github.com

wrote:

Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.

Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:

So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?

On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani < notifications@github.com

wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662

,

or mute the thread <

https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560

, or mute the thread <

https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824 , or mute the thread < https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231242540, or mute the thread https://github.com/notifications/unsubscribe/AB2KBvwwvSDUsJT7w3ZFNfTzYVnGA0Oxks5qTZP0gaJpZM4JEt7K .

thbeh commented 8 years ago

Jags,

I think I would look at 2 scenario -

  1. Long serving - conventional olap analytics
  2. Ad hoc short lived requirements for ML tasks

Please refer below for my comments.

Thanks

Regards Beh

On Fri, Jul 8, 2016 at 12:29 PM, Jags Ramnarayan notifications@github.com wrote:

I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do). - I would expect using mesos to managed in-memory db for adhoc processes such as machine learning tasks where once results are generated, nodes are no longer required and can be torn down. Here, can I say snappydata would be able to provide fast processing for such ML tasks? e.g data forensics, fraud analysis

That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos? This is would be what I intend to use for conventional olap queries. So having separate cluster of nodes for snappydata make sense here, e.g. top sales by region, product sales analysis, etc

I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy? This is probably my challenge in spinning up snappydata nodes in a mesos cluster

When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db? Yes for both.


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 5:04 PM, T H Beh notifications@github.com wrote:

I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.

Any thoughts?

On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan < notifications@github.com

wrote:

Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.

Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?


Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata

On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:

So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?

On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:

I am trying to produce it as well. Will update when I get the issue again.

On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani < notifications@github.com

wrote:

@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662

,

or mute the thread <

https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560

, or mute the thread <

https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824

, or mute the thread <

https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231242540 , or mute the thread < https://github.com/notifications/unsubscribe/AB2KBvwwvSDUsJT7w3ZFNfTzYVnGA0Oxks5qTZP0gaJpZM4JEt7K

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231245798, or mute the thread https://github.com/notifications/unsubscribe/AHBFhorAaFcnuncjZoKK8U0OIoIsfHWYks5qTZnPgaJpZM4JEt7K .

jramnara commented 8 years ago

thanks. thoughts below ...

I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do).

  • I would expect using mesos to managed in-memory db for adhoc processes such as machine learning tasks where once results are generated, nodes are no longer required and can be torn down. Here, can I say snappydata would be able to provide fast processing for such ML tasks? e.g data forensics, fraud analysis

ok. Yes, assuming your ML tasks are iterative and need repeated access it makes sense to consider snappy.

That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos?

This is would be what I intend to use for conventional olap

queries. So having separate cluster of nodes for snappydata make sense here, e.g. top sales by region, product sales analysis, etc

ok

I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy?

This is probably my challenge in

spinning up snappydata nodes in a mesos cluster

We will look at supporting mesos natively at some point. Maybe, you have a few cycles to help us get there sooner :-)

When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db? Yes for both.

thbeh commented 8 years ago

quick question....i don't see any guide to start pulse on 0.5.0?

SachinJanani commented 8 years ago

@thbeh To start pulse with snappydata-0.5 cluster please follow the below steps: 1) Change locators configuration to have these parameters -jmx-manager-start=true -jmx-manager-http-port=7075 2) Restart the snappydata cluster and open the URL http://localhost:7075/pulse 3) Provide default username and password as admin

Note: We are continuously improving pulse so please let me know if you see any issues

thbeh commented 8 years ago

Question - I manage to create 2 dockers container, one locator (locator1) and one server (server1).

When I login to server1's snapp[y-shell, I have to connect client 'localhost:1527', is that correct? what's the column showing NETSERVERS? snappy-shell

sumwale commented 7 years ago

Zeppelin usage has been documented and is tested before releases so closing. @thbeh