Open fkarg opened 9 years ago
[ ] MongoDB [ ] PostgreSQL [ ] Something completely different
If we plan to change the underlying database, I think it would be important to first create a fixed (PEP8 confirmative) interface for LocationHandler.py
as that would be used for all the interfacing with the data.
I also think that the processes should keep communicating in JSON, as that is, in my opinion, the cleanest way. Haskell using the database as well doesn't really seem necessary to me.
Well, I'd have to rewrite the python code from scratch anyways and now I might even implement it in the standard xD
but anyways, if you're going to have several dozen, or even several hundred Nodes and with them the probabilities, I don't think it'd be that good an idea to communicate that in JSON since it might get some size, though it might be faster, I don't know
But In general I think it's a bad idea to put the whole tree in JSON every time anew, so I'd choose either pre-A* or a database for this ^^
There's a command line tool called pep8 which could help you.
@blueburningcoder: We could also invent our own, simple to parse, format for serializing the graph. Something like:
"<from>" "<to>" <prob,prob2,...>
With data in it:
"Königsplatz" "Dom" 0.8,0.2,0.7,0.5
It's parsing, no server, no performance.
?
anyways, reparsing the whole thing every time anew, with only changing probabilities? Crazy idea. I like it. Might get really huge at some point, but at this we should have implemented sth acceptable or have enough resources to do things like this, so it might not really matter now anyways. And yes I'd say we need that new kind of format if we don't implement sth else at all at the haskell part ... well then, it's settled, well make our own kind of format and reparse an evergrowing database everytime anew - with probabilities we will need proportianally less and less, but are (ironically) rowing exponentially. Well, let's do it! xDD
Let's write everything in Haskell :p
About the database: MongoDB and PostgreSQL are bad ideas actually, since they are relational databases and we have a graph-structure, so currently I'm looking into Neo4j, that is for one a graph-database, and for the other not only accessible with python but with haskell as well. I don't know yet if it is possible to have several layers for the ticks and stuff, but I'm looking into it
Using a graph-database is a great idea! +1
Hi! Just wanted to say that MongoDB is not a relational database and PostgreSQL doesn't have to be one. A graph database is useful for highly interconnected information. For losely bundled information, MongoDB or CouchDB for example are the better choice.
I don't know the structure you guys need, I just wanted to make a qualified comment. If you guys need some examples for where to put what data structures, I can sketch something.
@nkoehring well, it's basically about storing and modifying this graph as performantly and efficiently as possible.
THIS graph or graphes like this but bigger or alot of different graphs like this one but not heavily interconnected with each other?
Actually, graphs like this, with up to 96 connections in one direction, maybe some coming back, and if there's a second node or a third (I think up to 8 might be realistic) it's going to be quite connected - I don't really know if this is what you could call heavily interconnected, but what else?
Or rather, I just thought it that way, there might really be more than one better way for this, I thought about for each Tick making one connection if the probability is, let's say >= 10 or sth - meaning we'd have up to 96 connections from one node to another in each direction, which isn't that good, but any other idea how to store the tickinfo?
Would you explain the meaning of this? Then I may help with suggestions.
Well, as you might know, we have (as time) ticks, a tick is about a quarter of an hour, so for all the probabilities through day we need 96 ticks (24 * 4 = 96). Oh, and btw, when looking around neo4j, I found you can give connections not only one attribute, but several. I'd rather do it this way than creating a new connection every time again. Sure, just checking if the connection for that time exists if possible too, but this way a visualisation will not really be possible at all. So for heavens sake: A connection has 96 values for a day, an ID (I guess), and a direction. Any ideas?
First and foremost, only using a graph database won't be enough. We still have a ton of other data to manage, that fits way better into a relational database (user data, data about the exchange points, statistics for calculating the probabilites, etc.).
For the graph database, some sort of array structure for the data in of the edges would be nice, but I believe @blueburningcoder knows the best how to store the data.
@nkoehring: We basicly have a routing algorithm over a graph, whose edge-weights the depend on the time-of-day and therefore change while the routing is taking place. Meaning, we have a gaph whose edges store quite an amount of data.
Well, to some extent, but not really. Anyways, I think there's something else: we need at least some of the data in a graph-structure and therefore at least for this a somewhat graph-supporting database. My current choice is neo4j, and right now I'd say we store the other data in the same database. Sure, with two databases running it would certainly (a bit) faster and all, but there will be inconsistency’s very fast, and that's actually something no one wants. Only because a graph database can things relational ones can't really do that good, it doesn't exclude them doing the stuff relational ones do too. So I'm going to implement neo4j the next few days if there isn't anyone against it.
I'd rather use a nice "usual" database than having to handle special cases…
If the graphing part is not too much you still could use mongodb. But it looks like a graph db makes more sense. Another possibility is storing and manipulating information in a quick binary field inside redis and use mongodb for the actual data
@lukasepple is this a pro or a con from your side? what do you see as special cases? and what do you see as a "nice 'usual' database" ?
Someone who might need our algorithm: http://news.google.com/news/url?sa=t&fd=R&ct2=de&usg=AFQjCNEVvMWqZao9U7FYDsH5gJbMu-oSSQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779444021738&ei=CPuAVYCdIM_TaramguAL&url=http://www.n-tv.de/wirtschaft/Amazon-will-Privatpersonen-ausliefern-lassen-article15314811.html
I've included a 'how to get the DB running' the python-folder, and I close the issue as soon I've set a standard for the other things we need to save in it. Currently saved in the db: the exchangePoints and 96 random probabilities between two nodes. Missing: Packages, Paths, Routes, and Users
Very nice. I hope I get done with schoolwork soon to have time for some work on the algorithm.
some kind of real database (not that JSON-blend), if possible accessible from both python and haskell