Loupi / Frontenac

A .NET port of the Tinkerpop Stack
MIT License
28 stars 1 forks source link

objective #1

Open SepiaGroup opened 11 years ago

SepiaGroup commented 11 years ago

Louis,

I am interested in implementing this in .Net as well. what is your objective in doing this? are you planning on continuing this to port neo4j to .net?

Michael

Loupi commented 11 years ago

Hello Michael,

I discovered in the latests days that you implemented a neo4j rest client that looks very nice.

My objective is not to create another neo4j .Net client, but rather to use an existing one to implement a Blueprints Property Graph with blueprints-core .Net.

In fact, It would be very nice to see an implementation that uses both Frontenac and Neo4jRestNet become possible. Maybe we could unite our efforts?

There is a lot of people who ported Blueprints implementations or commuication protocols for other graph databases to C#:

Tomas Bosak created OrientDB-NET.binary ArangoDB-NET and rexster.net

Daniel Kuppitz has RexProClient

Also, threre is a .NET RDF library available at http://www.dotnetrdf.org/ And the socialmedia research foundation created NodeXL

We could all benefit of a common set of interfaces to access in a standard way all of these great libraries. I think blueprints-core .Net and it's test suite can help here.

SepiaGroup commented 11 years ago

Louis,

i wrote Neo4jRestNet because at the time i was looking for a rest lib for neo4j in C# i could not find one i liked. I did not like the way Neo4jClient was implemented at the time, but to be fair, i have not looked at it since i wrote my so maybe it has changed. but my interest in what you are doing is not because of a rest api client.

I like neo4j and use it for one of my clients. let me first say, I don't have anything against linux or the open source community but..., take what i am going to say next that i am a .net guy and i am not a huge java developer. i just don't like java all that much (not that there is anything wrong with java but it is kinda verbose and a pain to write in compared to c# or f# and no linq library). so with that said, i have been thinking that writing a graph data store in .net using c# and f# would be interesting and very useful for the .net community, especially if it would run on mono. the cypher language would be straight forward in f# and the functional nature of f# and linq is very well suited for traversing nodes/edges.

I do understand that what you are writing is blueprints and not a graph data store but it is something that would need to be developed if you wanted to make your own store. i don't know if this sounds like a foolish idea (you are more than welcomed to say it is crazy) but i think a quality open source graph db in .net would get some (maybe not a lot) attention and would be interesting to do in c#/f#.

so if you don't think that this is too crazy of an ideal let me know.

thanks michael

On Apr 26, 2013, at 10:29 PM, Louis-Pierre Beaumont notifications@github.com wrote:

Hello Michael,

I discovered in the latests days that you implemented a neo4j rest client that looks very nice.

My objective is not to create another neo4j .Net client, but rather to use an existing one to implement a Blueprints Property Graph with blueprints-core .Net.

In fact, It would be would be very nice to see an implementation that uses both Frontenac and Neo4jRestNet become possible. Maybe we could unite our efforts?

There is a lot of people who ported Blueprints implementations or commuication protocols to for other graph databases to C#:

Tomas Bosak created OrientDB-NET.binary https://github.com/yojimbo87/OrientDB-NET.binary ArangoDB-NET https://github.com/yojimbo87/ArangoDB-NET and rexster.net https://github.com/yojimbo87/rexster.net

Daniel Kuppitz also has RexProClient https://github.com/dkuppitz/rexpro-client/tree/master/RexProClient

Also, threre is a .NET RDF library available at http://www.dotnetrdf.org/ And the socialmedia research foundation created NodeXL http://nodexl.codeplex.com/

We could all benefit of a common set of interfaces to access in a standard way all of these great libraries. I think blueprints-core .Net can help here.

— Reply to this email directly or view it on GitHub.

yojimbo87 commented 11 years ago

Hi guys,

I understand that Louis is doing a TinkerPop stack port, but I would like to add my input regarding the .NET based graph store which Michael was talking about.

First of all I don't think that making a neo4j based .NET port would make much sense because of two main points:

  1. It's built with Java specific API and philosophy. Now, to be clear, I'm not saying that Java is bad or worse technology, but (apart from very similar C based syntax and here and there roughly similar API) .NET has somewhat different set of principles which could/should be used to exploit competetive advantage in many areas.
  2. If you do only a port of it, you would probably end up being constantly catching up with the latest (or not so latest) releases given the state of neo4j on the market, number of developers and its wide community.

Therefore, I think, it would make much more sense to create a .NET graph store which is not a spinoff of any Java based graph databases, although some ideas/principles would be interesting to have, such as LINQuified cypher language. I don't know what's the state of existing .NET graph databases on the market, but I'm sure there are some, although probably none of them is as popular as neo4j or other Java based graph stores which is a mystery to me since, I think, there is a lot of potential, use cases and users/developers for graph databases built on top of .NET stack.

SepiaGroup commented 11 years ago

Tomas,

i could not agree more! i am not suggesting porting neo4j, .Net has a lot of features that are very well suited to making a very efficient graph store and they should be used. Not to mention the power LINQ/PLINQ and conciseness of f# would be a huge benefit.

on the topic of .net graph stores there are a few that i know of:

Trinity: which is written by MS but is only for research http://research.microsoft.com/en-us/projects/trinity/

Sones: http://www.sones.de/static-en/ but they went out of business. they were a proprietary database which could explain why they did not make it. if you search around about on them you will see that they did receive a lot of funding. when they went under they opened up their source code and can still download.

other then that i have not see much beyond simple hacks at doing this.

I too am confused why there is not a viable graph store on the .net stack, or at least on Azure. however, i do feel that it is something that is sought after (just look on SO) and would be used. Having a .net store would simplify my current project and i would have used one if it was out there.

thoughts? michael

On Apr 27, 2013, at 5:27 AM, Tomas Bosak notifications@github.com wrote:

Hi guys,

I understand that Louis is doing a TinkerPop stack port, but I would like to add my input regarding the .NET based graph store which Michael was talking about.

First of all I don't think that making a neo4j based .NET port would make much sense because of two main points:

It's built with Java specific API and philosophy. Now, to be clear, I'm not saying that Java is bad or worse technology, but (apart from very similar C based syntax and here and there roughly similar API) .NET has somewhat different set of principles which could/should be used to exploit competetive advantage in many areas. If you do only a port of it, you would probably end up being constantly catching up with the latest (or not so latest) releases given the state of neo4j on the market, number of developers and its wide community. Therefore, I think, it would make much more sense to create a .NET graph store which is not a spinoff of any Java based graph databases, although some ideas/principles would be interesting to have, such as LINQuified cypher language. I don't know what's the state of existing .NET graph databases on the market, but I'm sure there are some, although probably none of them is as popular as neo4j or other Java based graph stores which is a mystery to me since, I think, there is a lot of potential, use cases and users/developers for graph databases built on top of .NET stack.

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

Hi guys!

I agree with both of you, and no, you are not crazy Michal, just ambitious and that's ok! I too thought I was crazy to port blueprints to NET, now I'm not alone! ;)

I also think that LINQ and a lot of other .NET technologies would make a lot of sense in a graph database framework. I would be very interested to contribute to such a framework in .NET.

Talking about spinoff, I may be facing some java spinoff issues too: https://github.com/dkuppitz/rexpro-client/issues/3 Maybe it's time to refactor! :) I'm aleady seing a lot of these issues/(opportunities to adapt in .NET/LINQ) arise in the upcoming ports of Pipes and Gremlin. Please let me know what you think about that.

About existing stores Trinity is dead. Microsoft is now in partnership with Hortonworks who will bring a SQL Server addon that can read write to hadoop stores.

There is also VelocityDB

SepiaGroup commented 11 years ago

Hortonworks is for hadoop - not really a graph db.

velocitydb - interesting but again it is a object store not a graph db. when i looked at it a year or so ago it was not a good replacement for neo4j.

i did see the comment about you using java names - stop doing that. :) I use resharper to keep me consistent. for the most part i like it but it does slow down vs a little.

well not to sound to ambitions - i am interested in doing this for no other reason than to develop something that interests me.

On Apr 27, 2013, at 10:16 AM, Louis-Pierre Beaumont notifications@github.com wrote:

Hi guys!

I agree with both of you, and no, you are not crazy Michal, just ambitious and that's ok! I too thought I was crazy to port blueprints to NET, now I'm not alone! ;)

I also think that LINQ and a lot of other .NET technologies would make a lot of sense in a graph database framework. I would be very interested to contribute to such a framework in .NET.

Talking about spinoff, I may be facing some java spinoff issues too: dkuppitz/rexpro-client#3 Maybe it's time to refactor! :) I'm aleady seing a lot of these issues/(opportunities to adapt in .NET/LINQ) arise in the upcoming ports of Pipes and Gremlin. Please let me know what you think about that.

About existing stores Trinity is dead. Microsoft is now in partnership with Hortonworks who will bring a SQL Server addon that can read write to hadoop stores.

There is also VelocityDB

— Reply to this email directly or view it on GitHub.

yojimbo87 commented 11 years ago

I think the biggest challenge is to implement the storage engine which involves CAP theorem related problems. In Java world the situation is much simpler since there are lot of engines out there that could be used for this stuff. For example, if I remember correctly, neo4j is using Lucene and Titan is running on Cassandra. That way you don't have to deal with low level stuff and concentrate on core functionality. This approach is kind of problematic in .NET world since it would render the database solution not to be purely .NET based.

Loupi commented 11 years ago

There is a port of Lucene for .NET. Officially released Oct 2012.

SepiaGroup commented 11 years ago

Well the actual storing is the reason i have not started. i have not found a way on windows to handle the storage in an efficient, concurrent and fault tolerant way. there is the .net memory map which is the closes thing i can find in .net. not sure if it is a good fit however.

as far as indexing using lucene - i think that we can solve this after we figure out how to hand the actual storing of data to disk.

any suggestions?

On Apr 27, 2013, at 5:25 PM, Louis-Pierre Beaumont notifications@github.com wrote:

There is a port of Lucene for .NET. Officially released Oct 2012.

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

Looking at "The-Benefits-of-Titan", Titan also supports Berkeley DB as a storage mechanism. It offers CA from CAP theorem. I don't know about performances here too, and it is not distributed. Quickly looking at Berkeley DB docs, it is written in C, can run on Windows and there is a C# API available (P/Invoke). It's a bit like SQLite.

There is also Microsoft ESENT.

RavenDB uses Lucene.NET and ESENT under the hood.

SepiaGroup commented 11 years ago

correct me if i am wrong here, but titan is another graph db. it was/is written by the guys at tinkerpop. they built titan to use either hbase, cassandra or berkeley db depending on how you want CAP.

Now i dont know what neo4j uses but their db is on a single instance and replicated to other nodes in a HA configuration using zookeeper.

in .net land i don't know of any thing that is similar to these packages. this is where .net really is lacking and why large systems are built on linux/java.

but with that said, sones did create a graph db in .net

the source code is here

https://github.com/sones/sones

i will look more into how they physically write/update data to disk - to be honest i really don't know how they do it and i look at the code once before :)

The code is well documented and does use a lot of linq/plinq and is very interesting.

let me know if you make any headway on deciphering it.

On Apr 27, 2013, at 6:54 PM, Louis-Pierre Beaumont notifications@github.com wrote:≈

Looking at "The-Benefits-of-Titan", Titan also supports Berkeley DB as a storage mechanism. It offers CA from CAP theorem. I don't know about performances here too, and it is not distributed. Quickly looking at Berkeley DB docs, it is written in C, can run on Windows and there is a C# API available (P/Invoke). It's a bit like SQLite.

There is also Microsoft ESENT.

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

Your right about Titan. I like it's idea of abstracting the data storage layer.

I found that for neo4j: an overview of neo4j internals Rooting out redundancy - The new Neo4j Property Store

I'm going to look at sones source to see if I can understand it's storage mechanism.

Being optimistic, I think that both RavenDB (a .NET NoSQL database) and BerkeleyDB could be used to perform storage in a similar way that Titan and neo4j do. They both offer replication, and RavenDB is scalable and supports failover.

Loupi commented 11 years ago

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

SepiaGroup commented 11 years ago

I looked at sones more last night and that is exactly what I can to understand as well. Maybe they do have a store plugin but have not shared it.

I looked at the links you sent, the first one is very interesting and informative. I seem to remember reading an tech paper on storing graph data similar to this, I will see if I can find it again. But it looks like they developed their own file system layer on top of standard java io calls, something like mapped files. Have you seen the memory mapped file in .net? It may be something close to what the are using for the actual io.

I will look more into your suggestion of using a db like titon does. That may be a faster way to get started. The idea of abstracting the file system, like titon does, is a very good one.

On Apr 28, 2013, at 1:53 AM, Louis-Pierre Beaumont notifications@github.com wrote:

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

— Reply to this email directly or view it on GitHub.

SepiaGroup commented 11 years ago

Louis,

after reading the few links you sent and a good night sleep i may have an approach (maybe).

all actions on the graph can be done in a sorted order that is reproducible every time. that is, create a node, add properties will produce the same result every time and makes sense. (add properties then create node does not make sense - ie in the wrong order).

so if you write out the actions to a transaction log file in sorted order to the bottom of the file and read off the top and process them into the graph data file you should not have any issues implementing this. if you cache the graph data in memory and make changes to this cache, you will also write the changes to the log file. so new nodes and updated properties show up instantly in memory and are written some time later to the tans log file. you should not have to go to the graph data file for this data since it is already in memory. if you are asked to look at a node that is not in memory you make a call to read it from disk and cache that node (and other data if needed). once it is in memory you proceed to update it and write the updates to the trans log file. this approach should not be to difficult in .net and a simple proof of concept should be easy to bang out.

the issue now is getting data that is pushed out of memory. i dont think this would be an issue because we would be in control of the memory cache and can handle this (ie. dont flush data that has pending writes - details to be worked out later...).

i think this is what neo does from reading the docs you sent. they also ship these log files to the master node in an ha configuration.

what are your thoughts on this high-level approach?

On Apr 28, 2013, at 1:53 AM, Louis-Pierre Beaumont notifications@github.com wrote:

Ok, I tried to basically understand how sones works. I must say that I'm impressed.

There is a service layer, with a plugin architecture. IGraphDS is sones service interface. GraphDSServer serves it, and GraphDSClient consumes it.

It has a plugin architecture. Plugins can be created for their query language, to import and export graphs and to perform Indexing.

It has 2 indexing plugin implementations: Lucene.NET and Memory Based.

GraphDSServer uses an IGraphDB to perform it's operations on a graph database. There is 1 IGraphDB implementation: SonesGraphDB.

SonesGraphDB internally uses an IGraphFS to perform IO operations. I could only find one implementation of IGraphFS: InMemoryNonRevisioned

Maybe I missed something, but I'm under the impression that sones does not store nothing on disk. (Apart it's IO plugins, and that's not what we are looking for here). In their external libraries, I found BplusDotNet, which could be used to serialize b+ trees, but I could not find where it is used.

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

Hello,

The transaction log approach makes me think of SQL Server Log Shipping, but from memory to GraphDB. I wonder what caching system neo4j uses. Maybe memory mapped files, memcached or redis could be used here. I really like this approach of caching and doing batch writes.

Also, I've read some more theory on graph data structures. Here they talk about index-free-adjacency and provide 3 algorithms to deal with it. I'm sure both neo4j and Titan implement either Adjacency List or Incidence List. I think it is now the time to read their source and see how they serialize it to their store. We will then understand how they use keyvalue stores to persist the data.

After having read this article, I'm tempted to create a proof of concept that implements Adjacency List or Incidence List with ESENT. Why am I so excited about ESENT? I haven't tried it yet, but reading the docs it has a nice indexing mechanism, and works a bit like a big table. I think it could be used to achieve Vertex Indexing, and Edge Indexing. Reading the wiki, they say it can also store sequential data (the adjency lists?). Anyway, I'm only speculating, need to try it now.

About RavenDB, I found this comment from Oren Eini, where he says that it is not primarily designed for graphs, but could be used for it with custom bundles.

SepiaGroup commented 11 years ago

the trans log is very similar to sql but in this case it will work very good - hence that is why neo does this.

i am not sure that we would need to use mapped files or any other cacheing implement ion at this time. i am thinking that we just have a pointer to the head of the graph and the rest linked off of that. to get to nodes by id we have a key/value dictionary that point to the objects.

i think the approach to go with is using an incidence list to start with.

on the storage using ESENT. ESENT is the new incarnation of the JET db built into windows. i know active directory uses it. When it was called JET (many years ago) it was a modified version of the access db. to me i am not to concerned in what method is used to write to the dive as the methods will be an implementation of an interface. this will allow many implementations with not core code changes. if you want to learn ESENT - go right ahead.

i my have time this week to start on a proof of concept for the graph objects and trans logs. what do you think if we start a new git repo for this work?

On Apr 29, 2013, at 7:00 PM, Louis-Pierre Beaumont notifications@github.com wrote:

Hello,

The transaction log approach makes me think of SQL Server Log Shipping, but from memory to GraphDB. I wonder what caching system neo4j uses. Maybe memory mapped files, memcached or redis could be used here.

Also, I've read some more theory on graph data structures. Here they talk about index-free-adjacency and provide 3 algorithms to deal with it. I'm sure both neo4j and Titan implement either Adjacency List or Incidence List. I think it is now the time to read their source and see how they serialize it to their store. We will then understand how they use keyvalue stores to persist the data.

After having read this article, I'm tempted to create a proof of concept that implements Adjacency List or Incidence List with ESENT. Why am I so excited about ESENT? I haven't tried it yet, but reading the docs it has a nice indexing mechanism, and works a bit like a big table. I think it could be used to achieve Vertex Indexing, and Edge Indexing. Reading the wiki, they say it can also store sequential data (the adjency lits?). Anyway, I'm only speculating, need to try it now.

About RavenDB, I found this comment from Oren Eini, where he says that it is not primarily designed for graphs, but could be used for it with custom bundles.

— Reply to this email directly or view it on GitHub.

mickdelaney commented 11 years ago

Not sure if you guys ever came across https://github.com/cosh/fallen-8
Its a C# in-memory graph database

Loupi commented 11 years ago

Thank you Mick for this link. I will have a look at it. I've been reading Titan file system source in my spare times this week. The guys at Tinkaurelius did a wonderfull job, and the code is all documented.

Michael, I do agree with you on the abstraction of the filesystem, and for the new repo too. Do you think we should host it here, at Sepiagroup, or anywhere else?

I've started playing with ESENT, and if I work hard this week-end, I may be able to commit a proof of concept. I'll then adapt it to fit the FS interface. I think in a coupe of weeks, if everything goes well, we will be in a good position to integrate the trans log with the FS.

SepiaGroup commented 11 years ago

Mick again thanks for the link - that looks very good!

louis you may also want to take a look at how he stores his data. he uses a api that he developed. take a quick look at this http://www.slideshare.net/HenningRauch/graphdatabases slid 73 seems interesting if true. however, it does not look like this is under active development.

i would like it on SepiaGroup - the name sepia is latin for a cuttlefish (also the color brown) which has the ability to change shape, color and texture (really an amazing animal if you ever get the chance to dive and see one), kinda like a schema-less database (hence the reason i came up with the name). however i am not going to make a stink about it if you are willing to be a partner. other wise i say we start a new one. let me know and i will build a repo.

On May 2, 2013, at 9:44 PM, Louis-Pierre Beaumont notifications@github.com wrote:

Thank you Mick for this link. I will have a look at it. I've been reading Titan file system source in my spare times this week. The guys at Tinkaurelius did a wonderfull job, and the code is all documented.

Michael, I do agree with you on the abstraction of the filesystem, and for the new repo too. Do you think we should host it here, at Sepiagroup, or anywhere else?

I've started playing with ESENT, and if I work hard this week-end, I may be able to commit a proof of concept. I'll then adapt it to fit the FS interface. I think in a coupe of weeks, if everything goes well, we will be in a good position to integrate the trans log with the FS.

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

I looked a bit at https://github.com/cosh/fallen-8/blob/master/Fallen-8/Persistency/PersistencyFactory.cs. The way I understand it, it is writing/reading a whole graph at once. I could not find a query language, apart some stuff in algorithms folder. The fact that the whole graph is in memory could maybe explain the performance graph of slide 73.

I'm ok for hosting it on Sepiagroup. Cute fish! There is an expert diver at my job, I'm going to give him this info, I'm sure he is going to like it :)

mickdelaney commented 11 years ago

Henning Rauch told me he's re-writing fallen-8 in c++, but that if any bugs appear in the c# version he'll fix them, so he's still active in the space, might be worth including him in your discussions, probably some cross pollination.

mickdelaney commented 11 years ago

i just sent him an email with this thread, hopefully he'll jump in with some thoughts...

cosh commented 11 years ago

Morning guys!

my name is Henning Rauch. I'm the former head of R&D of sones... so if you got questions concerning that, I can give you answers in any detail. Furthermore I built Fallen-8 after leaving the company.

(I'm going to write sth about those two products right now...)

Cheers, Henning.

cosh commented 11 years ago

sones GraphDB: A GraphDB that aimed to kick ass all existing databases :). Well to make a long story short: I didn't worked out that well. It was separated into two parts a community edition and an enterprise edition. The only difference between those was the persistent filesystem. The main features of the sones graphdb were the nice separation of all layers (service, graph, query language and filesystem).

cosh commented 11 years ago

Fallen-8: This project reflects my learnings from the sones GraphDB. Instead of trying to create a "one fits all" solution I created a product for the niche. It's main focus is analytics. Thats why it's in-memory. BUUUT it has some kind of checkpointing functionality. So at any point in time the user is able to create sth like a savegame :). This action should be as fast as possible. I did a lot of consulting in 2012 on this project and developed nice services on top of it.

cosh commented 11 years ago

The new Fallen-8: It's written in C++ and will be visible soon. I decided to use MIT license again. It will be faster, consume even less memory and would support some other nice features. For me that's the next baby step towards a distributed in-memory graphdatabase. This is my ultimate goal and everything will be as free as possible.

cosh commented 11 years ago

@Loupi concerning persistency: you are absolutely right. It's totally in-memory. No evil caching. The benchmarks you are referring to used the "strong caches" of neo4j. So I tried to have everything in memory there too. BUT the numbers are saying that caching is not the same as in-memory (cpt. obvious :) )

cosh commented 11 years ago

@SepiaGroup Is it true? Yes it is. The numbers of traversals per second are still growing. I convinced some companies to use it and they are really happy with it. But I need to repeat it: It's focus is analytics, so be sure to use sth persistent underneath and create a fast ETL job.

yojimbo87 commented 11 years ago

@cosh Hi Henning, is speed and memory consumption the only reason why you rebuilt the fallen-8 in C++ or are there also other factors? What are your thoughts on creating a graph database on top of some fast K/V store like redis for example?

cosh commented 11 years ago

@yojimbo87 Hi. Those were the main reasons for me. The architecture of the new F8 will go in the same direction as you described. One difference: I'm not going to use redis. I'm using my own in-memory column store which is in my opinion perfect for my requirements. Besides that, I would like to use other low level libs which allow me to do RDMA to extend F8 to more than node.

Loupi commented 11 years ago

Hi Cosh, nice to meet you. I really appreciate your presence here. I like how sones abstracts all layers of a graph DB system. Also, by looking at sone's source I discovered about Irony, which looks very interesting. I see that sones has GraphQL plugins too. It is great and educative to see different ways of implementing a query language in a Graph DB (comparing with Gremlin here).

I'm curious about the enterprise version of sones: how does it store the graph on disk?

I'm tempted to implement a blueprints-fallen-8-graph with Frontenac. What do you think about it?

SepiaGroup commented 11 years ago

Henning,

Can you explain why sones did not make it? From what I can find about them, they were well funded and had a good idea. I wouldn't like to make the same mistakes as they did.

On May 3, 2013, at 5:07 AM, Henning Rauch notifications@github.com wrote:

@yojimbo87 Hi. Those were the main reasons for me. The architecture of the new F8 will go in the same direction as you described. One difference: I'm not going to use redis. I'm using my own in-memory column store which is in my opinion perfect for my requirements. Besides that, I would like to use other low level libs which allow me to do RDMA to extend F8 to more than node.

— Reply to this email directly or view it on GitHub.

cosh commented 11 years ago

@Loupi Yep, the layers were great. It's necessary for proper testing and of course for their "enterprise"-concept. Concerning Irony: This is one of the greatest libs I used in .NET. Really really great. It needs some time to get familiar with it but in the end it works really great. We never had any bigger issues there and the query language was one of the biggest pros of the sones GraphDB in that time. I designed big parts of the language and if you are interested I could timewarp my brain into the past and see how I can help you. Concerning Storing on disk: there were multiple approaches to that challenge. The idea was to create a revisioned multi-purpose and distributed file-system. The last and version reused ideas from RDBMS... i.e paging, multiple layers... The performance was quite OK in the end but not as good as our competitors. If you want more info, contact me.

Concerning blueprints-fallen-8-graph: I would be honored if you would like to do that and would support you as much as possible. What would be the effort?

cosh commented 11 years ago

@SepiaGroup They were well funded and had great ideas... BUT:

  1. technology was not focussed
  2. founder-internal-problems
  3. lost too many POCs
  4. overselling

In the end they gave me the beautiful opportunity to find my passion. That's why I'm still very proud of that part of my life.

SepiaGroup commented 11 years ago

Henning,

my goal in building a .net graph db would be as follows

  1. a graph db as functional as neo4j. not a port of neo but something that is as functional as neo but .net centric.
  2. query language that is LINQ centric, strongly typed and intuitive.
  3. well designed and fast
  4. can run on windows, mono and has backend stores optimized for different platforms (windows, amazon, azure etc…)
  5. implements well establish interfaces, blueprints etc.

thats the main items (i am sure i am forgetting a few).

that said, fallen8 satisfies a few and sones satisfy a few as well, but neither satisfy all. I really like your data model but i am not sure it will be easy to modify it so that when a node/edge/property gets created/updated/deleted to have it store to disk. you can correct me if i am wrong on this.

at a high level what i am thinking is that when a nodes/edges/properties is created/edited/deleted then an entry would be written to a transaction log that would very quickly write it to the end of a log file. then another process would read these logs and then apply them to the graph data store. this would survive a fault as well because when the systems starts again it will continue processing the logs. any nodes that have been created/edited would be in memory so there would be no need to read from the data store again, so this will allow the updates to the data store happen at a slower speed. i know that neo works something like this. i am wondering if your data model would be well suited for something like this. also we could abstract the storage interface so that we could be several data storages and not affect the core code. what are your thought on this approach? if you have a better idea i would like to hear that as well.

thanks a lot of your insight. michael

On May 3, 2013, at 11:05 AM, Henning Rauch notifications@github.com wrote:

@SepiaGroup They were well funded and had great ideas... BUT:

  1. technology was not focussed
  2. founder-internal-problems
  3. lost too many POCs
  4. overselling

In the end they gave me the beautiful opportunity to find my passion. That's why I'm still very proud of that part of my life.

— Reply to this email directly or view it on GitHub.

cosh commented 11 years ago
  1. Do you want to do this for fun or business?
  2. Your transaction approach is definitly valid and should be easy to implement. I've been asked by a customer to develop sth like this. A simple write ahead log. This would fit the needs of many people. It has to be asynchronous of course.
  3. I would reuse the plugin management of F8 and the services.
  4. You could also reuse the F8 kernel and change the create/update/delete methods to support that WAL. And there must be a global flag that states if the database is sane or not. If not, you would have to replay the WAL.
  5. The LINQ stuff should be implemented on top of your kernel

In the end its your decision but F8 might bring you some low hanging fruits and I already know 1-N customers who would like to have it.

Cheers, Henning

SepiaGroup commented 11 years ago

i would like to do it for business but in the beginning i don't know how much demand there is for this in .net. is your customer a paying client? i am a contract developer and have my own company for the past three years. i am always looking for more work - mostly .net

i will look more into your data model this weekend and let you know if i have any questions.

with the query language - i agree it would be on top of the kernel but the kernel would need to have the data in a format that is usable.

i think that f8 would be a good place to start and modify it when needed. also having you as a resource is a huge help.

Thanks

On May 3, 2013, at 5:01 PM, Henning Rauch notifications@github.com wrote:

Do you want to do this for fun or business? Your transaction approach is definitly valid and should be easy to implement. I've been asked by a customer to develop sth like this. A simple write ahead log. This would fit the needs of many people. It has to be asynchronous of course. I would reuse the plugin management of F8 and the services. You could also reuse the F8 kernel and change the create/update/delete methods to support that WAL. And there must be a global flag that states if the database is sane or not. If not, you would have to replay the WAL. The LINQ stuff should be implemented on top of your kernel In the end its your decision but F8 might bring you some low hanging fruits and I already know 1-N customers who would like to have it.

Cheers, Henning

— Reply to this email directly or view it on GitHub.

cosh commented 11 years ago

Yes, they are paying for F8 service development. Meaning I implemented a lot of these: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/IService.cs example: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/REST/AdminServicePlugin.cs --> https://github.com/cosh/fallen-8/blob/master/Fallen-8/Service/REST/AdminService.cs

Contact me if you want to know how to satisfy a customer with F8 :)

Cheers, Henning.

Loupi commented 11 years ago

@cosh

For the query language, I would like it to satisfy everyones requirements. We discussed this topic a bit earlier in the first comments of this issue. From that discussion we can see that everybody would like to use a fluent API with both C# and F#. This makes a lot of sense, a lot of .NET developers like these fluent apis, and it could be the foundation of the query language.

Beside, I think that a scripting engine would add value here. I've had a lot of commercial success stories with IronPython. It is an easy language to learn and is well documented. Other scripting languages exist too. Simply put, I think those languages could simply call the fluent API. Sending scripts over the network to change business rules on the fly and execute requests on a service is one of my needs.

Both neo4j and TinkerPop offer this feature through cypher and Gremlin.

Looking at Fallen-8 architecture, I think this would fit as a plugin, right? What do you think about that. Would it be easier for users? I have no experience with Irony. Would it better fit here?

Loupi commented 11 years ago

On the blueprints-fallen-8-graph: after having looked at fallen-8 source, I realised that it would not require much efforts to integrate it with Frontenac. What I need to know is it's supported features. If you could give me a list of bools for the properties of this class this would be fantastic. https://github.com/Loupi/Frontenac/blob/master/Blueprints/blueprints-core/Features.cs

As I understand, I will need to host fallen-8 into the Blueprints graph like that: https://github.com/cosh/fallen-8/blob/master/Startup/Startup.cs

I'm sure more questions will come later.

Loupi commented 11 years ago

On fallen-8: like Michael says, I think that fallen-8 would be a solid foundation for our upcoming work. It is clean, performant, the codebase is not astronomic/unmanageable, and it was made by someone who worked on one of the few serious graph db in .NET. I like it's plugin system, and looking at the links you provided, I noticed that we can even upload new plugins using the AdminService. Wow!

I'm curious about those savegame files. What is the best period for the ETL job in a HA system? What disk capacity and techology is best for it. Would backing up every ten minutes be ok? Could only the differences between 2 backup sets be written to disk, and woud it be woth it too?

SepiaGroup commented 11 years ago

Louis,

i do see the benefit of a fluent api, like what i built for gremlin in my api. however what i built for cypher uses lambda expressions and i parse the expression tree and convert the expression into cypher syntax. I, like you, would like to support both and multiple scripting languages, don't forget you can host asp as well. but i think the biggest benefit would come from a cypher like language but using LINQ syntax. this would give a very .net centric query language that c# developers would find natural. i don't know if you have used entity framework, but it does have some very interesting functionality, regardless if you like using it or not. imagine you have a cypher query but your nodes and relationships are strongly typed and you are able to use the power of LINQ aggregate and other commends. you could also use data annotations that would provide context help for what relationships point to what nodes etc. your returned data would be loaded into the correct data classes and you can just update the class and than update the graph. this would make data manipulation very easy and clean with very little overhead because .net property implements anonymous types, one of the great things about .net. if using f# it get even more streamlined. but in order for this to work the graph data model needs to be in a form that works well with the traversal algorithm. this is something henning would be a great asset to give pointers for. I am hoping that this weekend i can find enough time to really look deep into f8 and come up with a high level design/implementation.

what are your thoughts on this? thanks michael

On May 3, 2013, at 6:47 PM, Louis-Pierre Beaumont notifications@github.com wrote:

@cosh

For the query language, I would like it to satisfy everyones requirements. We discussed this topic a bit earlier in the first comments of this issue. From that discussion we can see that everybody would like to use a fluent API with both C# and F#. This makes a lot of sense, a lot of .NET developers like these fluent apis, and it could be the foundation of the query language.

Beside, I think that a scripting engine would add value here. I've had a lot of commercial success stories with IronPython. It is an easy language to learn and is well documented. Other scripting languages exist too. Simply put, I think those languages could simply call the fluent API. Sending scripts over the network to change business rules on the fly and execute requests on a service is one of my needs.

Both neo4j and TinkerPop offer this feature through cypher and Gremlin.

Looking at Fallen-8 architecture, I think this would fit as a plugin, right? What do you think about that. Would it be easier for users? I have no experience with Irony. Would it better fit here?

— Reply to this email directly or view it on GitHub.

Loupi commented 11 years ago

Michael, we come from the same world. I've been using both LINQ to SQL and EF in commercial solutions since their CTP. I also played with custom code generation with T4 templates reading edmx files. This is very powerfull. Custom providers can also be implemented.

Looking at the TinkerPop stack, they have the Java equivalent of what you are describing here: Frames. It is the graph ORM of the TinkerPop stack. They use attributes to annotate the objects, and these attributes can even contain gremlin queries to fill collections.

I also think that a Cypher like DSL is the way to go. I'm sure we'll find a lot of pointers inside sones GQL source files too.

cosh commented 11 years ago

Hi guys! I'll comment tomorrow.

SepiaGroup commented 11 years ago

henning,

i have spent some time reviewing F8 and have a few questions - also learned a few things too…

i have been reviewing the classes in the Model folder. these classes are the core elements of the graph db. If i understand it correctly, you then use BigList class to contain all of the graph elements that are read/write to disk. BigList has a method GetElement that will search all elements to find the ID of the element sought. after you find the element you are looking to get, i assume you then use some sort of traversal technique to get the other elements you want.

The EdgeModel class has references to source/target vertexes while the VertexModel has references to all the incoming and outgoing edges. the in/out reference are List of EdgeContainers which organize the edges into types of edges by using the EdgePropertyId.

so if have the above correct my questions are:

why are you using arrays to hold the elements in BigArray? i see you are using a two diminutional array and shard the data but couldn't you use a concurrent collection? using a concurrent collection would reduce the need for all the locking code. also if using a concurrent dictionary you could reduce some of the code used in searching for an element by id and i think you could still come up with some form of sharing. array of concurrent collection. also i know that concurrent collections are slow but when there are a lot of thread reading/writing they do preform well.

i have the same question for how the references to edges within the vertex class are stored, could a concurrent bag or dictionary be used instead of a List?

i did not find where you do your traversals, can you point me to where i should look.

thanks for the help michael

On May 5, 2013, at 5:02 PM, Henning Rauch notifications@github.com wrote:

Hi guys! I'll comment tomorrow.

— Reply to this email directly or view it on GitHub.

cosh commented 11 years ago

Hi Michael,

What a nice review :). BigList... I used concurrent data structures but they had two drawbacks: too big, too slow. I had to optimize a lot for memory usage. And concurrent bag is the slowest thing on earth if you ask me :). Of course I did a lot of benchmarks but in the end it did not work out. For concurrency I use the class https://github.com/cosh/fallen-8/blob/master/Fallen-8/Helper/AThreadSafeElement.cs which implements a SpinLock. This enables F8 to have multiple concurrent reads and only one write at a time.

List... I made an assumption that the number of https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/EdgeContainer.cs would not be that big and again: I had to optimize for size. The Dictionary would consume too much memory

Concerning traversals... First things first:

  1. Get the starting vertex by an secondary index lookup or via ID or via GraphScan
  2. Traverse manually by calling TryGetOutEdge (https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/VertexModel.cs#L508 ) or TryGetInEdge (https://github.com/cosh/fallen-8/blob/master/Fallen-8/Model/VertexModel.cs#L543 ) 2.1 There you have to name the EdgePropertyID you are interested in 2.2 The edge property is identified by an UInt16 (again: because of size and because I hate Strings). This UInt16 can be seen as the name of the edge (like "Friends" or "Enemies") 2.3 If there has been an EdgeProperty with the interesting id you get back an true and as an out-param the ReadOnlyCollection (example: https://github.com/cosh/Fallen-8-Intro/blob/master/Fallen-8%20Intro/IntroProvider.cs#L116 ) 2.4 If you used an incoming edge you should proceed with the SourceVertex in the EdgeModel otherwise go with the TargetVertex
  3. start again with 2

Usually I hide this complexity behind a service that is dedicated to exactly one task.

Additionally you could use the https://github.com/cosh/fallen-8/blob/master/Fallen-8/Fallen8.cs#L577 to calculate all shortest paths between two vertices. Therefor you need an ShortestPathPlugin (which I usually sell). Example: https://github.com/cosh/fallen-8/blob/master/Fallen-8/Algorithms/Path/BidirectionalLevelSynchronousSSSP.cs

I hope I could answer some of your questions.

Cheers, Henning.

cosh commented 11 years ago

@Loupi You can try out the Fallen8 intro if you want to get an impression of the traversal speed and the checkpointing-mechanism. Please have a look at this: https://github.com/cosh/Fallen-8-Intro

After you are finished with the benchmarks you could use the Admin-Service to execute as many Checkpoints as you want. All GraphElements and all secondary indices will be saved at once in multiple threads. So: the more CPU and fast disks you got, the faster this will be. In this F8 intro it usually saves about 2.000.000 Edges or Vertices per second.

cosh commented 11 years ago

@Loupi concerning query language... I'm all for a query language. But I did not want to have it in the core. As you know I built big parts of the sones GraphQL and my experience is that this takes really a lot of time to implement. At the time I built F8 I did not had the time to do this.

Loupi commented 11 years ago

Hi guys, sorry for my lack of presence these latest days, my laptop/dev machine went into a brick! :( http://forums.lenovo.com/t5/IdeaPad-Y-U-V-Z-and-P-series/y580-Black-Screen-of-Death/td-p/798003

All my projects are on hold and it sucks! I'm in the process of switching back to a temporary computer.

@cosh Thank you for this link on the fallen8 intro. I'm going to benchmark it with one of my production datasets. I'm under the impression that, in it's current form, fallen-8 would be a better fit than SQL server for my business cases.