Suggestion: Document improvement

jfstephe commented 7 years ago

Hi, Please can you remove references to ‘cluster’ from JanusGraph documentation, specifically on the Graph Partitioning page – this is one thing that threw me with Titan, as there is no 'cluster'. There are several isolated standalone services.

I would also love to see a recommended/suggested/example production architecture diagram (not just words please), with high availability etc, for both the general case and possibly specific to each backend store.

Many thanks! John

FlorianHockmann commented 7 years ago

Big +1, especially for the recommended production architecture diagram! Maybe more than one diagram would be helpful for different scenarios like with/without an index backend (and which one), with/without Spark and so on. Figuring this out is probably one of the biggest problems newcomers have. I tried to explain a setup for Titan with Spark on StackOverflow once, but I am not even sure if this is the best setup.

robertdale commented 7 years ago

@jfstephe Could you clarify where the confusion is? JanusGraph can be deployed as a cluster itself and/or on a clustered backend. Is it there specific wording that's ambiguous? For example, the Graph Partitioning page starts with:

When the JanusGraph cluster consists of multiple storage backend instances, the graph is partitioned across those machines.

Would it be better worded as

When JanusGraph is deployed on a cluster of multiple storage backend instances, the graph is partitioned across those machines.

??

Also, is there something lacking in the deployment diagrams shown in the Cassandra and HBase storage backend pages?

I can see where the Index Backends don't get the same treatment as the Storage Backends so I agree that diagrams would be helpful there. Maybe some abstract deployment diagrams would be more helpful. They could come before the Storage/Index backend pages and replace those diagrams.

jfstephe commented 7 years ago

Hi,

So the first time I came across Titan I struggled to find out from the docs what 'clustering' meant. For me it conjured up nodes in the cluster talking to each other coordinating locking etc etc. This was not the case and I think it would be good to explicitly state that. IMHO I think it would be good to state that you could use one or N standalone nodes and they behave exactly the same regardless of the size of N. Configuration may change but it would consistent (I believe) across all nodes. There would not be a 'master' node as sometimes is seen when nodes in a cluster communicate, as there is not inter-node communication.

The lack of clear documentation led me to raise this SO post a while ago: https://stackoverflow.com/questions/41674226/how-does-the-titan-not-backend-storage-clustering-work

The follow up questions I had in that post show some of the thoughts going through my head at the time. If it doesn't do inter-node communication, and it all is dependent on the back end implementation then the Titan docs are only as good as those posted by the back end implementations supported. In this case, DynamoDb, which is pretty poor - try and find out how consistency conflicts are resolved and if I as a consumer of the Dynamo/Titan should care, for example.

Also, for me (and others on SO) a single node cluster isn't a realistic production case and I 'd like to see some clear (including a diagram please) examples of this. Perhaps even examples of potential CI/CD pipelines (dev, text, staging, production etc) to illustrate that different use cases are supported and have been thought about. E.g. You would probably want to run locally in dev, with a local backend. Some may use docker in some or all environments, and some may not at all, but having docker and non-docker examples through the CI/CD environments up to and including a realistic prod environment would be golden! :-)

I looked at the the deployment diagrams for Janus and they are an improvement on Titan, but it would be good to have them on the Storage backend index page as they are common patterns that apply to all backends. If I am looking at DynamoDB (supported on Titan but undocumented) does the Cassandra documentation apply to me? How do I know? Are there differences in the diagram between Cassandra and HBase? I have to compare the pages manually myself. In this regard I think the docs could be DRY-er. Further, related to the CD/CI bit above, I would have also benefited from a diagram showing example logical and physical deployments.

I hope I've given a clearer indication of the issues I faced. If not and you'd like more details we can continue here or message me and we can skype to discuss if you prefer. Thanks, John

FlorianHockmann commented 6 years ago

I might give this a shot but wanted to confirm first that I'm taking the right approach. My suggestion for the recommended architecture would be that we add diagrams to the docs for three distinct scenarios:

Getting Started:
- 3 servers with JanusGraph Server and an instance of the storage backend co-located on each
- Optional index backend also co-located on those servers
Advanced:
- N JanusGraph Servers on their own servers
- Cluster of storage backends
- Optional index backend cluster
Testing:
- JanusGraph Server + in-memory (or Berkeley) storage backend + optional Lucene index backend

The first two scenarios should be completely backend agnostic (excluding the non-scalable backends that I mentioned in Testing).

The accompanying text should make these points clear:

Applications only communicate directly with the JanusGraph Servers.
They can connect to any of them and make use of a load-balancer to schedule requests. (Maybe with a mention that there is nothing like a master for the different JanusGraph Servers.)
The JanusGraph instances don't talk to each other directly.
Any mixture of the first two scenarios can also be used in production especially when the application grows beyond the getting started phase, e.g., JanusGraph instances on dedicated servers but storage and index backends still co-located.

Do you guys think that something like that would help new users to better understand how they can deploy JanusGraph? Is that what you had in mind, @jfstephe and @robertdale?

Where should this be added? We could of course just add it in another chapter under JanusGraph Basics or Storage Backends but I think it would make sense to introduce a completely new part for everything related to operating JanusGraph. We could then move chapters like Failure & Recovery, Index Management and Monitoring JanusGraph to this new part. Later on, we could add new content for example about security, back-ups, upgrading, how schema roll-out should be performed and recommendations for hardware sizing (mostly links to resources for the backends + recommendations for the JanusGraph Server). Currently, the docs mix usage and operating of JanusGraph together and instead split the docs into basic and advanced usage and operating but I think that some readers are mostly interested in the usage side (developers) and some are more interested in the operating aspects. Other databases also have dedicated parts for operating related docs:

PostgreSQL: Part III. Server Administration
MariaDB: MariaDB Administration
Cassandra: Operating Cassandra
Scylla: Scylla for Administrators
Neo4j: The Neo4j Operations Manual

What tool can we use to create those diagrams? Does anyone know which tool was used to create the other diagram in the docs? My suggestion would be draw.io but I'm not sure whether the resulting diagrams will match the style of the existing ones.

Regarding Docker: I'm not sure how we should address that in the docs. Wouldn't it be better to simply offer Docker Compose files for the first two scenarios and a Dockerfile for the testing scenario? We could then simply mention that it's of course possible to use Docker and link to the JanusGraph Docker repo with those resources.

jfstephe commented 6 years ago

@FlorianHockmann - I think it'd be ace if you could take a stab at this. I've had to step away from all thing GraphDb-like ATM but will hopefully return soon!

FWIW when I'm coming to a new persistence tech 'basic' and 'advanced' are ok things to group capabilities by, but I don't really want to lose the capability detail. For example, clustering of storage backends is an implementation detail that supports a specific capability(s) e.g. fault tollerance. I'd rather see the capabilities listed under the 'basic'/'advanced' areas and then have the implementations that support the one or more capabilities. The 'why' we do things is important, then I can decide what I want rather than trying to work out all the reasons why JanusGraph peeps supported 'clusterining of storage backends' :-).

In-memory DB for testing makes sense to me, unless there's a specific reason to use Berkeley?

Draw.io I've used Draw.io with confluence in the past and it's great, but I don't know if there's a standard for this project.

Pictures are better than words for concepts/architecture (but then I'm quite visual!).

Docker IMHO there should be a single docker file for JanusGraph. Environment variables/parameters should configure it for different environments. If there needs to be a set of compose files for JanusGraph paired with specific backends then so be it but they should all use the docker file/image from the single standard JanusGraph docker file.

kgoralski commented 6 years ago

I would like to see docs improvement on this field https://stackoverflow.com/questions/46386299/gremlin-server-withremote-connection-closed-how-to-reconnect-automatically https://stackoverflow.com/questions/47536418/how-to-connect-to-gremlin-server-through-java-using-the-gremlin-driver-with-sess

FlorianHockmann commented 6 years ago

@kgoralski, those two SO questions look like they are about different topics, namely automatic recovery for failed connections / an unavailable server, whereas this issue is more about what a cluster is in the context of JanusGraph and different deployment scenarios. So, I'd suggest that you create a new issue and describe there exactly what the docs should explain.

JanusGraph / janusgraph

Suggestion: Document improvement #119