switching to tabula - Githubissues

evdokim commented 10 years ago

We had discussion with @eparejatobes about his new library tabula and its application in dynamograph.

parsing

parser should read xml/csv files and produce scarph nodes and vertices of the given model
modeling tables

For the given scarph model we can define abstractions that will encapsulate tables (from tabula that we be used for storing edges and nodes something like:

trait VertexTable[V <: Vertex] extends HashKeyTableType(name, region, hash) 

trait EdgeTables[E <: Edge] {
  trait InTable extends AnyCompositeKeyTable {...}
  trait OutTable extends AnyCompositeKeyTable {...}
}

So here we will have explicit binding of our model and used tables.

writing to dynamodb

Than for writing nodes and edges we will need other traits:

trait VertexWriter[V <: Vertex] {
  val vertexTable: VertexTable[V]
  def write(v: vertex): Item[VertexWriter#Tpe] //or  List[WriteOperations] 
}

choosing right output types for writing methods is a not trivial question. Good thing that you don't need to wait until tabula will finished. Because actual writing can be completely separated. In the end output should be transformed anyway into something like List[WriteOperations] - input for actual DynamoDBwriter.

reading from DynamoDB

for that we need something DynamoDBVertex and DynamoDBEdge that will use VertexTable and EdgeTables.

tabula state

It realized as ohnosequences %% tabula % 0.1.0-SNAPSHOT

alberskib commented 10 years ago

So for each type of the vertex we will need seperate instance of VertexWriter?

evdokim commented 10 years ago

separated object of what?

alberskib commented 10 years ago

I modified comment as it was no precise. Because of VertexWriter contains val vertexTable each VertexTable will require seperate instance with proper value of the val vertexTable. VertexWriter will produce input for real writer that hit to database, correct?

alberskib commented 10 years ago

It looks like that current classes: DynamoRawVertex and DynamoRawEdge are no longer needed. They will be replaced by DynamoDBVertex and DynamoDBEdge?

evdokim commented 10 years ago

for different vertices type you will need different instances of VertexTable and hence writers will write to different tables.

It looks like that current classes: DynamoRawVertex and DynamoRawEdge are no longer needed. They will be replaced by DynamoDBVertex and DynamoDBEdge? yes I think that we no longer need it, it will be clear when reading will be implemented

eparejatobes commented 10 years ago

@bio4j/dynamograph I think the API for defining tables is stable enough now, see ohnosequences/tabula#3, that branch is what is published now in 0.1.0-SNAPSHOT. As things are still not merged, the meeting will be next Monday. I will take a look at the code and everything during the weekend.

alberskib commented 10 years ago

Great. During weekend I (also) am going to work on code so I will introduce changes ad hoc

laughedelic commented 10 years ago

can we set the meeting on Tuesday instead, so that I could participate in it too?

alberskib commented 10 years ago

Is there any special reason why tabula does not offer CompositeKeyTable? For AnyHashKeyTable there is HashKeyTable but for AnyCompositeKeyTable there is no CompositeKeyTable. It will be added in the future or I should implement it on my own?

eparejatobes commented 10 years ago

@alberskib just added it, thanks for noticing

alberskib commented 10 years ago

@bio4j/dynamograph I am little confused about the architecture of reading values from files and saving to the DB. We need to talk about it(actually I spend some time to think about it).

eparejatobes commented 10 years ago

@alberskib yes agreed. We could see this tomorrow.

alberskib commented 10 years ago

@bio4j/dynamograph What time I am available to 11:30 and from 16:00 - 22:00

eparejatobes commented 10 years ago

19:00?

alberskib commented 10 years ago

Ok

evdokim commented 10 years ago

I saw co of writers and it's nice, I would only like to suggest remove DynamoDBVertex from it, and do generation of DynamoDB items in writers. Writers should deal with table names as well

evdokim commented 10 years ago

@alberskib I will write things that I was speaking today.

parsing

Now when it's incremental it's fine. In general it's hard to choose good types for it results, but the main point of all these Reps is that it should be easy to shift between representations. But important thing is to get in then end precise types like PartOf. Although all these edges now are subtypes of DynamoDBEdge they are not really related to DynamoDB we can map them to some hierarchy of case classes for example.

writers

Once you got Rep value of a edge or vertex, you can sent it directly to corresponding writers. Doing it in this way you will increase performance and avoid complexity of manipulated with structures like HList. When we will finish tabula you can use it. The idea is to transform vertices/edges to items that you can write using tabula. It gives you some additional type checks for example, you wont be able to write an item to a table if this item doesn't have hashKey attribute from this table:

case class PutItemHashKey[T <: AnyHashKeyTable with Singleton, I <: AnyItem with Singleton](
  table: T,
  inputState: AnyTableState.For[T] with ReadyTable,
  item: I,
  itemRep: I#Rep
)(implicit val hasHashKey: HasProperty[I#Tpe, T#HashKey])

But current implementation is also ok.

actual writing to DynamoDB

Writer can generate some representation of writing operations, for example you can use something similar as input for dynamodb batch write: Map<String,List<WriteRequest>> requestItems. It's not so important now, but it will be useful when we will start doing distributed writing to DynamoDB

creating, deleting tables

It can be done with tabula

bio4j / dynamograph

switching to tabula #27

parsing

modeling tables

writing to dynamodb

reading from DynamoDB

tabula state

parsing

writers

actual writing to DynamoDB

creating, deleting tables