Closed thbeh closed 7 years ago
As far as I know Zepelin creates a SparkContext implicitly. You can put Snappy libraries in spark dependency folder in Zeppelin.
@thbeh If you are using zeppelin 0.5.6 then replace zeppelin-0.5.6-incubating-bin-all/interpreter/spark/dep/zeppelin-spark-dependencies-0.5.6-incubating.jar
with snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar
this is because zeppelin is embedded with Spark 1.5 and snappydata-0.4 supports spark 1.6.Please let me know if you face any issues
Hi, I am using v0.6.0 of Zeppelin, do I need to do the same?
Regards Beh On 5/07/2016 6:03 PM, "SachinJanani" notifications@github.com wrote:
@thbeh https://github.com/thbeh If you are using zeppelin 0.5.6 then replace zeppelin-0.5.6-incubating-bin-all/interpreter/spark/dep/zeppelin-spark-dependencies-0.5.6-incubating.jar with snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar this is because zeppelin is embedded with Spark 1.5 and snappydata-0.4 supports spark 1.6.Please let me know if you face any issues
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230393964, or mute the thread https://github.com/notifications/unsubscribe/AHBFhv5e2Oxren0Xguu_Z5353FecVvEKks5qSfO6gaJpZM4JEt7K .
For zeppelin 0.6 you have to only copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar
in <ZEPPELIN_HOME>/interpreter/spark/dep/
So I copied jar as Sachin instructed and doesn't make sense, what did I missed?
[image: Inline image 2]
On Tue, Jul 5, 2016 at 9:04 PM, SachinJanani notifications@github.com wrote:
For zeppelin 0.6 you have to only copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar in
/interpreter/spark/dep/ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230425141, or mute the thread https://github.com/notifications/unsubscribe/AHBFhnly4oezLkaEyKg5cs6vwJldUTzrks5qSh4lgaJpZM4JEt7K .
@thbeh Can't see the attachment.
Re sent image as attachment
[image: Inline image 1]
On Wed, Jul 6, 2016 at 10:51 AM, hbhanawat notifications@github.com wrote:
@thbeh https://github.com/thbeh Can't see the attachment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230625844, or mute the thread https://github.com/notifications/unsubscribe/AHBFhk3766CzE3t-Qej-5wOomzXJptMpks5qSuAAgaJpZM4JEt7K .
@thbeh Still can't see it. Looked at the github page as well but is not visible there too. https://github.com/SnappyDataInc/snappydata/issues/296
You should be able to see it now.
One very funny thing is that I have to set 'zeppelin.spark.useHiveContext to false' before I could see snappy store. But %sql still complaining about table does not exist.
On Wed, Jul 6, 2016 at 11:18 AM, hbhanawat notifications@github.com wrote:
@thbeh https://github.com/thbeh Still can't see it. Looked at the github page as well but is not visible there too. #296 https://github.com/SnappyDataInc/snappydata/issues/296
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230630954, or mute the thread https://github.com/notifications/unsubscribe/AHBFhn-eOvpm6mR6vRX_vqUrMXcSf8TAks5qSuZegaJpZM4JEt7K .
What is happening here is that you have registered airlines table as a temp table in SnappyContext, which is a specialized SQLContext. and you are querying it using %sql which internally uses a SQLContext. Temp tables are not visible across contexts.
You can do two things here.
With 0.6 release of zeppellin they are allowing better integration for interpreters. We are working on adding an interpreter for Snappy.
I reverted back to 0.5.6 but looks the same. I have attached a snapshot, in the same notebook, one datasource from snappydata and another from csv. The csv converted to DF->TempTable works fine but not the DF from snappy -
I don't see a zeppelin repo in snappydatainc's github?
On Wed, Jul 6, 2016 at 2:36 PM, hbhanawat notifications@github.com wrote:
What is happening here is that you have registered airlines table as a temp table in SnappyContext, which is a specialized SQLContext. and you are querying it using %sql which internally uses a SQLContext. Temp tables are not visible across contexts.
You can do two things here.
- You can make your query run by firing sqlContext.sql("your query here"). sqlContext is the SnappyContext that you have created.
- You can use zeppelin server from our our forked Zeppelin repo so that you don't have to run the query using sqlContext.sql. You can then use %snappy-sql to fire the queries directly.
With 0.6 release of zeppellin they are allowing better integration for interpreters. We are working on adding an interpreter for Snappy.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230657095, or mute the thread https://github.com/notifications/unsubscribe/AHBFhtr76Av1Hh1-LO0SXD7RXXhcaLeBks5qSxSjgaJpZM4JEt7K .
Same problem as reported by Hemant. 'temp' is scoped by the SnappyContext and 'Auction' is scoped by the spark context you created.
We haven't yet the Snappydata Zeppelin interpreter public, yet. @rishitesh, @SachinJanani can you guys make the latest branch with the Zeppelin interpreter and support for 0.6 zeppelin accessible? in fact, make the binary distribution for this branch available. I think that will resolve all issues reported here.
@thbeh can simply use '%snappy-sql' for his second para.
Great, hopefully I can get my hands on it soon. Maybe someone can point me to the repo I could compile from source as I did for Zeppelin 0.6.0. Thanks
On Wed, Jul 6, 2016 at 4:48 PM, Jags Ramnarayan notifications@github.com wrote:
Same problem as reported by Hemant. 'temp' is scoped by the SnappyContext and 'Auction' is scoped by the spark context you created.
We haven't yet the Snappydata Zeppelin interpreter public, yet. @rishitesh https://github.com/rishitesh, @SachinJanani https://github.com/SachinJanani can you guys make the latest branch with the Zeppelin interpreter and support for 0.6 zeppelin accessible? in fact, make the binary distribution for this branch available. I think that will resolve all issues reported here.
@thbeh https://github.com/thbeh can simply use '%snappy-sql' for his second para.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230670251, or mute the thread https://github.com/notifications/unsubscribe/AHBFhrPT-u7HhJ2o72-5buOYrnEQNX6Qks5qSzOkgaJpZM4JEt7K .
@thbeh We are about to create the binaries for the Zeppelin with snappydata interpreter but for now you can use the attached snappydata interpreter.Following are the steps that you will need in order to install snappydata interpreter in zeppelin 0.6:
1) Download attached snappydata-interpreter.tar.gz
2) Extract snappydata-interpreter.tar.gz
and copy snappydatasql
directory from snappydata-interpreter
in <ZEPPELIN_HOME>/interpreter/
3) Copy zeppelin-site.xml
from extracted snappydata-interpreter
to <ZEPPELIN_HOME>/conf/
4) Copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar
to <ZEPPELIN_HOME>/interpreter/snappydatasql
directory
5) Restart zeppelin daemon
6) Verify whether snappydatasql interpreter appears in the interpreter list
7) Now you can use snappydatasql interpreter using %snappydatasql
Please let me know if you face any issues snappydata-interpreter.tar.gz
the interpreter does not seem to work with my sourced compile of 0.6.0 and coincidentally Zeppelin 0.6.0 was released to day. Downloaded that and re copy snappydata intepreter to zeppellin and it works lie a charm.
Another thing I notice was that when a query is running and then cancelled, snappydata server kill itself.
On Wed, Jul 6, 2016 at 9:02 PM, SachinJanani notifications@github.com wrote:
@thbeh https://github.com/thbeh We are about to create the binaries for the Zeppelin with snappydata interpreter but for now you can use the attached snappydata interpreter.Following are the steps that you will need in order to install snappydata interpreter in zeppelin 0.6: 1) Download attached snappydata-interpreter.tar.gz 2) Extract snappydata-interpreter.tar.gz and copy snappydatasql directory from snappydata-interpreter in
/interpreter/ 3) Copy zeppelin-site.xml from extracted snappydata-interpreter to /conf/ 4) Copy snappydata-assembly_2.10-0.4.0-PREVIEW-hadoop2.4.1.jar to /interpreter/snappydatasql directory 5) Restart zeppelin daemon 6) Verify whether snappydatasql interpreter appears in the interpreter list snappydata-interpreter.tar.gz https://github.com/SnappyDataInc/snappydata/files/349420/snappydata-interpreter.tar.gz 7) Now you can use snappydatasql interpreter using %snappydatasql Please let me know if you face any issues — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230717512, or mute the thread https://github.com/notifications/unsubscribe/AHBFhmCI1dbziDTqKiNNJseo9sz4mO3Mks5qS289gaJpZM4JEt7K .
@thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662, or mute the thread https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K .
So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?
On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662, or mute the thread https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K .
Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.
Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:
So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?
On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani notifications@github.com wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662 , or mute the thread < https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560, or mute the thread https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K .
I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.
Any thoughts?
On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan notifications@github.com wrote:
Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.
Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:
So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?
On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani <notifications@github.com
wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662
,
or mute the thread <
https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560 , or mute the thread < https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824, or mute the thread https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K .
I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do).
That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos? I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy?
When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db?
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 5:04 PM, T H Beh notifications@github.com wrote:
I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.
Any thoughts?
On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan <notifications@github.com
wrote:
Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.
Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:
So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?
On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani < notifications@github.com
wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662
,
or mute the thread <
https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560
, or mute the thread <
https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824 , or mute the thread < https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231242540, or mute the thread https://github.com/notifications/unsubscribe/AB2KBvwwvSDUsJT7w3ZFNfTzYVnGA0Oxks5qTZP0gaJpZM4JEt7K .
Jags,
I think I would look at 2 scenario -
Please refer below for my comments.
Thanks
Regards Beh
On Fri, Jul 8, 2016 at 12:29 PM, Jags Ramnarayan notifications@github.com wrote:
I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do). - I would expect using mesos to managed in-memory db for adhoc processes such as machine learning tasks where once results are generated, nodes are no longer required and can be torn down. Here, can I say snappydata would be able to provide fast processing for such ML tasks? e.g data forensics, fraud analysis
That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos? This is would be what I intend to use for conventional olap queries. So having separate cluster of nodes for snappydata make sense here, e.g. top sales by region, product sales analysis, etc
I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy? This is probably my challenge in spinning up snappydata nodes in a mesos cluster
When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db? Yes for both.
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 5:04 PM, T H Beh notifications@github.com wrote:
I am trying to build a lambda architecture on top of mesos, having snappydata as the speed layer (i think that is what snappydata is built for) but not sure whether it make sense to do that or have a separate cluster (maybe 3 nodes) for snappydata cluster that allows user to connect from zeppelin.
Any thoughts?
On Fri, Jul 8, 2016 at 11:52 AM, Jags Ramnarayan < notifications@github.com
wrote:
Snappydata supports mesos when used in the split cluster mode. i.e. the spark compute nodes are isolated from the data server nodes. The data server nodes themselves are long running (they host data in memory) and do not yet support mesos.
Dynamic resource management through mesos/yarn would make more sense when running compute heavy tasks like map-reduce, anyway. What are you trying to solve?
Jags SnappyData blog http://www.snappydata.io/blog Download binary, source https://github.com/SnappyDataInc/snappydata
On Thu, Jul 7, 2016 at 2:10 PM, T H Beh notifications@github.com wrote:
So now that the interpreter looks good, how does snappydata looks like on a mesos architecture?
On Thu, Jul 7, 2016 at 7:13 PM, Teik Hooi Beh thbeh@thbeh.com wrote:
I am trying to produce it as well. Will update when I get the issue again.
On Thu, Jul 7, 2016 at 3:36 PM, SachinJanani < notifications@github.com
wrote:
@thbeh https://github.com/thbeh Good to hear that you are able to use interpreter.Cancelling the query should not affect the Snappydata server as this interpreter is simply a client to snappydata server. I tried to reproduce the issue that you mentioned by canceling the running query but was not able to reproduce it. Can you please let us know the detailed steps to reproduce it
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-230970662
,
or mute the thread <
https://github.com/notifications/unsubscribe/AHBFhjN7F0l-W2XCglPjewt-PiBG43NXks5qTHRBgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231208560
, or mute the thread <
https://github.com/notifications/unsubscribe/AB2KBnkjzLlMv3NVthFOp4L6crqpHolrks5qTWthgaJpZM4JEt7K
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <
https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231240824
, or mute the thread <
https://github.com/notifications/unsubscribe/AHBFhmLSlbAadj_yJ-FsSrdvM5-OCjUvks5qTZEkgaJpZM4JEt7K
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231242540 , or mute the thread < https://github.com/notifications/unsubscribe/AB2KBvwwvSDUsJT7w3ZFNfTzYVnGA0Oxks5qTZP0gaJpZM4JEt7K
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SnappyDataInc/snappydata/issues/296#issuecomment-231245798, or mute the thread https://github.com/notifications/unsubscribe/AHBFhorAaFcnuncjZoKK8U0OIoIsfHWYks5qTZnPgaJpZM4JEt7K .
thanks. thoughts below ...
I am no mesos expert but delegating to a resource manager makes a lot of sense when workload determines the resources required and having the cluster/resource manager allocate these on demand. But, if you think about snappydata (or any other in-memory database for that matter) you cannot really do this. We provision and potentially manage large quantities of data in memory for a long period. You would lose the value if you keep provisioning and releasing the memory (which a dynamic resource manager is built to do).
- I would expect using mesos to managed in-memory db for adhoc processes such as machine learning tasks where once results are generated, nodes are no longer required and can be torn down. Here, can I say snappydata would be able to provide fast processing for such ML tasks? e.g data forensics, fraud analysis
ok. Yes, assuming your ML tasks are iterative and need repeated access it makes sense to consider snappy.
That being said, mesos would make sense for you to manage your entire data center (or a subset) as a farm of resources and being fully abstracted away from knowing the nodes used for snappydata. Is this your motivation for mesos?
This is would be what I intend to use for conventional olap
queries. So having separate cluster of nodes for snappydata make sense here, e.g. top sales by region, product sales analysis, etc
ok
I suppose, there is nothing preventing you to launch snappy using mesos with some static provisioning policy?
This is probably my challenge in
spinning up snappydata nodes in a mesos cluster
We will look at supporting mesos natively at some point. Maybe, you have a few cycles to help us get there sooner :-)
When you say "speed layer" you would use snappy for both stream processing as well as a operational in-memory db? Yes for both.
quick question....i don't see any guide to start pulse on 0.5.0?
@thbeh To start pulse with snappydata-0.5 cluster please follow the below steps:
1) Change locators configuration to have these parameters -jmx-manager-start=true -jmx-manager-http-port=7075
2) Restart the snappydata cluster and open the URL http://localhost:7075/pulse
3) Provide default username and password as admin
Note: We are continuously improving pulse so please let me know if you see any issues
Question - I manage to create 2 dockers container, one locator (locator1) and one server (server1).
When I login to server1's snapp[y-shell, I have to connect client 'localhost:1527', is that correct? what's the column showing NETSERVERS?
Zeppelin usage has been documented and is tested before releases so closing. @thbeh
Correct me if I am wrong, from snappydata docs, it says that I should be able to access snappydata from external spark deployment by using snappycontext, am I correct? So can I do the following in a spark-shell -
val conf = new org.apache.spark.SparkConf() .setAppName("mySnappyApp") .setMaster("local[*]") .set("jobserver.enabled", "true") .set("snappydata.store.locators", "localhost:10334") .set("spark.ui.port", "4042") .set("spark.driver.extraLibraryPath", "/home/thbeh/snappydata-0.4.0-PREVIEW-bin/lib") .set("spark.driver.allowMultipleContexts","true")
val sc = new org.apache.spark.SparkContext(conf) val sqlContext = new org.apache.spark.sql.SnappyContext(sc) val airline = sqlContext.table("airline").show
.....but I got this error......
_conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@59b778dc sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6f25a644