Jugendhackt / paketmagie

Magische Pakete, 100% biologisch und gentechnikfrei geroutet
http://hackdash.org/projects/557bf10f3f8689f158e0f371
GNU General Public License v3.0
4 stars 0 forks source link

A real Database #15

Open fkarg opened 9 years ago

fkarg commented 9 years ago

some kind of real database (not that JSON-blend), if possible accessible from both python and haskell

sternenseemann commented 9 years ago

[ ] MongoDB [ ] PostgreSQL [ ] Something completely different

froozen commented 9 years ago

If we plan to change the underlying database, I think it would be important to first create a fixed (PEP8 confirmative) interface for LocationHandler.py as that would be used for all the interfacing with the data.

I also think that the processes should keep communicating in JSON, as that is, in my opinion, the cleanest way. Haskell using the database as well doesn't really seem necessary to me.

fkarg commented 9 years ago

Well, I'd have to rewrite the python code from scratch anyways and now I might even implement it in the standard xD

but anyways, if you're going to have several dozen, or even several hundred Nodes and with them the probabilities, I don't think it'd be that good an idea to communicate that in JSON since it might get some size, though it might be faster, I don't know

But In general I think it's a bad idea to put the whole tree in JSON every time anew, so I'd choose either pre-A* or a database for this ^^

sternenseemann commented 9 years ago

There's a command line tool called pep8 which could help you.

froozen commented 9 years ago

@blueburningcoder: We could also invent our own, simple to parse, format for serializing the graph. Something like:

"<from>" "<to>" <prob,prob2,...>

With data in it:

"Königsplatz" "Dom" 0.8,0.2,0.7,0.5
sternenseemann commented 9 years ago

It's parsing, no server, no performance.

fkarg commented 9 years ago

?

anyways, reparsing the whole thing every time anew, with only changing probabilities? Crazy idea. I like it. Might get really huge at some point, but at this we should have implemented sth acceptable or have enough resources to do things like this, so it might not really matter now anyways. And yes I'd say we need that new kind of format if we don't implement sth else at all at the haskell part ... well then, it's settled, well make our own kind of format and reparse an evergrowing database everytime anew - with probabilities we will need proportianally less and less, but are (ironically) rowing exponentially. Well, let's do it! xDD

sternenseemann commented 9 years ago

Let's write everything in Haskell :p

fkarg commented 9 years ago

About the database: MongoDB and PostgreSQL are bad ideas actually, since they are relational databases and we have a graph-structure, so currently I'm looking into Neo4j, that is for one a graph-database, and for the other not only accessible with python but with haskell as well. I don't know yet if it is possible to have several layers for the ticks and stuff, but I'm looking into it

froozen commented 9 years ago

Using a graph-database is a great idea! +1

nkoehring commented 9 years ago

Hi! Just wanted to say that MongoDB is not a relational database and PostgreSQL doesn't have to be one. A graph database is useful for highly interconnected information. For losely bundled information, MongoDB or CouchDB for example are the better choice.

I don't know the structure you guys need, I just wanted to make a qualified comment. If you guys need some examples for where to put what data structures, I can sketch something.

sternenseemann commented 9 years ago

@nkoehring well, it's basically about storing and modifying this graph as performantly and efficiently as possible.

nkoehring commented 9 years ago

THIS graph or graphes like this but bigger or alot of different graphs like this one but not heavily interconnected with each other?

fkarg commented 9 years ago

Actually, graphs like this, with up to 96 connections in one direction, maybe some coming back, and if there's a second node or a third (I think up to 8 might be realistic) it's going to be quite connected - I don't really know if this is what you could call heavily interconnected, but what else?

fkarg commented 9 years ago

Or rather, I just thought it that way, there might really be more than one better way for this, I thought about for each Tick making one connection if the probability is, let's say >= 10 or sth - meaning we'd have up to 96 connections from one node to another in each direction, which isn't that good, but any other idea how to store the tickinfo?

nkoehring commented 9 years ago

Would you explain the meaning of this? Then I may help with suggestions.

fkarg commented 9 years ago

Well, as you might know, we have (as time) ticks, a tick is about a quarter of an hour, so for all the probabilities through day we need 96 ticks (24 * 4 = 96). Oh, and btw, when looking around neo4j, I found you can give connections not only one attribute, but several. I'd rather do it this way than creating a new connection every time again. Sure, just checking if the connection for that time exists if possible too, but this way a visualisation will not really be possible at all. So for heavens sake: A connection has 96 values for a day, an ID (I guess), and a direction. Any ideas?

froozen commented 9 years ago

First and foremost, only using a graph database won't be enough. We still have a ton of other data to manage, that fits way better into a relational database (user data, data about the exchange points, statistics for calculating the probabilites, etc.).

For the graph database, some sort of array structure for the data in of the edges would be nice, but I believe @blueburningcoder knows the best how to store the data.

@nkoehring: We basicly have a routing algorithm over a graph, whose edge-weights the depend on the time-of-day and therefore change while the routing is taking place. Meaning, we have a gaph whose edges store quite an amount of data.

fkarg commented 9 years ago

Well, to some extent, but not really. Anyways, I think there's something else: we need at least some of the data in a graph-structure and therefore at least for this a somewhat graph-supporting database. My current choice is neo4j, and right now I'd say we store the other data in the same database. Sure, with two databases running it would certainly (a bit) faster and all, but there will be inconsistency’s very fast, and that's actually something no one wants. Only because a graph database can things relational ones can't really do that good, it doesn't exclude them doing the stuff relational ones do too. So I'm going to implement neo4j the next few days if there isn't anyone against it.

sternenseemann commented 9 years ago

I'd rather use a nice "usual" database than having to handle special cases…

nkoehring commented 9 years ago

If the graphing part is not too much you still could use mongodb. But it looks like a graph db makes more sense. Another possibility is storing and manipulating information in a quick binary field inside redis and use mongodb for the actual data

fkarg commented 9 years ago

@lukasepple is this a pro or a con from your side? what do you see as special cases? and what do you see as a "nice 'usual' database" ?

fkarg commented 9 years ago

Someone who might need our algorithm: http://news.google.com/news/url?sa=t&fd=R&ct2=de&usg=AFQjCNEVvMWqZao9U7FYDsH5gJbMu-oSSQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52779444021738&ei=CPuAVYCdIM_TaramguAL&url=http://www.n-tv.de/wirtschaft/Amazon-will-Privatpersonen-ausliefern-lassen-article15314811.html

fkarg commented 9 years ago

I've included a 'how to get the DB running' the python-folder, and I close the issue as soon I've set a standard for the other things we need to save in it. Currently saved in the db: the exchangePoints and 96 random probabilities between two nodes. Missing: Packages, Paths, Routes, and Users

froozen commented 9 years ago

Very nice. I hope I get done with schoolwork soon to have time for some work on the algorithm.