Closed p013570 closed 6 years ago
Update Properties Guide:
Copyright 2016-2017 Crown Copyright
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
This page has been generated from code. To make any changes please update the walkthrough docs in the doc module, run it and replace the content of this page with the output.
This properties documentation discusses some advanced property types that work nicely in Gaffer.
The example can be run in a similar way to the user and developer examples.
You can download the doc-jar-with-dependencies.jar from maven central. Select the latest version and download the jar-with-dependencies.jar file. Alternatively you can compile the code yourself by running a "mvn clean install -Pquick". The doc-jar-with-dependencies.jar file will be located here: doc/target/doc-jar-with-dependencies.jar.
# Replace <DoublesUnion> with your example name.
java -cp doc-jar-with-dependencies.jar uk.gov.gchq.gaffer.doc.properties.dev.walkthrough.DoublesUnion
Properties class: java.lang.String
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.lang.Long
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.lang.Integer
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.lang.Double
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.lang.Float
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: [Ljava.lang.Byte;
Predicates:
Aggregators:
To Bytes Serialisers:
Properties class: java.lang.Boolean
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.util.Date
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: uk.gov.gchq.gaffer.types.TypeValue
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: uk.gov.gchq.gaffer.types.TypeSubTypeValue
Predicates:
Aggregators:
To Bytes Serialisers:
Properties class: uk.gov.gchq.gaffer.types.FreqMap
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: java.util.HashMap
Predicates:
Aggregators:
To Bytes Serialisers:
Properties class: java.util.TreeSet
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: com.clearspring.analytics.stream.cardinality.HyperLogLogPlus
Predicates:
Aggregators:
To Bytes Serialisers:
Other Serialisers:
Properties class: org.roaringbitmap.RoaringBitmap
Predicates:
Aggregators:
To Bytes Serialisers:
The code for this example is DoublesUnion.
This example demonstrates how the DoublesUnion sketch from the Data Sketches library can be used to maintain estimates of the quantiles of a distribution of doubles. Suppose that every time an edge is observed, there is a double value associated with it, for example a value between 0 and 1 giving the score of the edge. Instead of storing a property that contains all the doubles observed, we can store a DoublesUnion which will allow us to estimate the median double, the 99th percentile, etc.
Properties class: com.yahoo.sketches.quantiles.DoublesUnion
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new elements schema. The edge has a property called 'doublesUnion'. This will store the DoublesUnion object.
{
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"doublesUnion": "doubles.union"
}
}
}
}
We have added a new type - 'doubles.union'. This is a com.yahoo.sketches.quantiles.DoublesUnion object. We also added in the serialiser and aggregator for the DoublesUnion object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new edges are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"doubles.union": {
"class": "com.yahoo.sketches.quantiles.DoublesUnion",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.quantiles.binaryoperator.DoublesUnionAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.quantiles.serialisation.DoublesUnionSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
Edge[source=A,destination=B,directed=false,group=red,properties=Properties[doublesUnion=<com.yahoo.sketches.quantiles.DoublesUnionImpl>
### Quantiles DoublesUnionImpl
maxK : 128
### Quantiles HeapUpdateDoublesSketch SUMMARY:
Empty : false
Direct, Capacity bytes : false,
Estimation Mode : true
K : 128
N : 1,000
Levels (Needed, Total, Valid): 2, 2, 2
Level Bit Pattern : 11
BaseBufferCount : 232
Combined Buffer Capacity : 512
Retained Items : 488
Compact Storage Bytes : 3,936
Updatable Storage Bytes : 4,128
Normalized Rank Error : 1.725%
Min Value : -3.148
Max Value : 3.112
### END SKETCH SUMMARY
]]
This is not very illuminating as this just shows the default toString()
method on the sketch. To get value from it we need to call methods on the DoublesUnion object. We can get an estimate for the 25th, 50th and 75th percentiles on edge A-B using the following code:
final GetElements query = new GetElements.Builder()
.input(new EdgeSeed("A", "B", DirectedType.UNDIRECTED))
.build();
final CloseableIterable<? extends Element> edges = graph.execute(query, user);
final Element edge = edges.iterator().next();
final com.yahoo.sketches.quantiles.DoublesUnion doublesUnion = (com.yahoo.sketches.quantiles.DoublesUnion) edge.getProperty("doublesUnion");
final double[] quantiles = doublesUnion.getResult().getQuantiles(new double[]{0.25D, 0.5D, 0.75D});
final String quantilesEstimate = "Edge A-B with percentiles of double property - 25th percentile: " + quantiles[0]
+ ", 50th percentile: " + quantiles[1]
+ ", 75th percentile: " + quantiles[2];
The results are as follows. This means that 25% of all the doubles on edge A-B had value less than -0.66, 50% had value less than -0.01 and 75% had value less than 0.64 (the results of the estimation are not deterministic so there may be small differences between the values below and those just quoted).
Edge A-B with percentiles of double property - 25th percentile: -0.6630847714290219, 50th percentile: -0.0071624422787210824, 75th percentile: 0.6341803995604817
We can also get the cumulative density predicate of the distribution of the doubles:
final GetElements query2 = new GetElements.Builder()
.input(new EdgeSeed("A", "B", DirectedType.UNDIRECTED))
.build();
final CloseableIterable<? extends Element> edges2 = graph.execute(query2, user);
final Element edge2 = edges2.iterator().next();
final DoublesSketch doublesSketch2 = ((com.yahoo.sketches.quantiles.DoublesUnion) edge2.getProperty("doublesUnion")).getResult();
final double[] cdf = doublesSketch2.getCDF(new double[]{0.0D, 1.0D, 2.0D});
final String cdfEstimate = "Edge A-B with CDF values at 0: " + cdf[0]
+ ", at 1: " + cdf[1]
+ ", at 2: " + cdf[2];
The results are:
Edge A-B with CDF values at 0: 0.506, at 1: 0.839, at 2: 0.983
The code for this example is LongsSketch.
This example demonstrates how the LongsSketch sketch from the Data Sketches library can be used to maintain estimates of the frequencies of longs stored on on vertices and edges. For example suppose every time an edge is observed there is a long value associated with it which specifies the size of the interaction. Storing all the different longs on the edge could be expensive in storage. Instead we can use a LongsSketch which will give us approximate counts of the number of times a particular long was observed.
Properties class: com.yahoo.sketches.frequencies.LongsSketch
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new elements schema. The edge has a property called 'longsSketch'. This will store the LongsSketch object.
{
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"longsSketch": "longs.sketch"
}
}
}
}
We have added a new type - 'longs.sketch'. This is a com.yahoo.sketches.frequencies.LongsSketch object. We also added in the serialiser and aggregator for the LongsSketch object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new edges are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"longs.sketch": {
"class": "com.yahoo.sketches.frequencies.LongsSketch",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.frequencies.binaryoperator.LongsSketchAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.frequencies.serialisation.LongsSketchSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
Only one edge is in the graph. This was added 1000 times, and each time it had the 'longs.sketch' property containing a randomly generated long between 0 and 9 (inclusive). The sketch does not retain all the distinct occurrences of these long values, but allows one to estimate the number of occurrences of the different values. Here is the Edge:
Edge[source=A,destination=B,directed=false,group=red,properties=Properties[longsSketch=<com.yahoo.sketches.frequencies.LongsSketch>FrequentLongsSketch:
Stream Length : 1000
Max Error Offset : 0
ReversePurgeLongHashMap:
Index: States Values Keys
0: 1 112 0
3: 1 96 6
5: 1 98 9
6: 2 92 4
7: 3 103 5
8: 2 91 2
9: 3 98 8
12: 1 106 1
13: 1 99 7
14: 1 105 3
]]
This is not very illuminating as this just shows the default toString()
method on the sketch. To get value from it we need to call methods on the LongsSketch object. Let's get estimates of the frequencies of the values 1 and 9.
We can fetch all cardinalities for all the vertices using the following code:
final GetElements query = new GetElements.Builder()
.input(new EdgeSeed("A", "B", DirectedType.UNDIRECTED))
.build();
final CloseableIterable<? extends Element> edges = graph.execute(query, user);
final Element edge = edges.iterator().next();
final com.yahoo.sketches.frequencies.LongsSketch longsSketch = (com.yahoo.sketches.frequencies.LongsSketch) edge.getProperty("longsSketch");
final String estimates = "Edge A-B: 1L seen approximately " + longsSketch.getEstimate(1L)
+ " times, 9L seen approximately " + longsSketch.getEstimate(9L) + " times.";
The results are as follows. As 1000 edges were generated with a long randomly sampled from 0 to 9 then the occurrence of each is approximately 100.
Edge A-B: 1L seen approximately 106 times, 9L seen approximately 98 times.
The code for this example is UnionSketch.
This example demonstrates how the Union sketch from the Data Sketches library can be used to maintain estimates of the cardinalities of sets. The Union sketch is similar to a HyperLogLogPlusPlus, but it can also be used to create the intersections of sets. We give an example of how this can be used to monitor the changes to the number of edges in the graph over time.
Properties class: com.yahoo.sketches.theta.Union
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new elements schema. The edge has properties called 'startDate' and 'endDate'. These will be set to the midnight before the time of the occurrence of the edge and to midnight after the time of the occurrence of the edge. There is also a size property which will be a Union. This property will be aggregated over the 'groupBy' properties of startDate and endDate.
{
"entities": {
"size": {
"vertex": "vertex.string",
"properties": {
"startDate": "date.earliest",
"endDate": "date.latest",
"size": "union"
},
"groupBy": [
"startDate",
"endDate"
]
}
},
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"startDate": "date.earliest",
"endDate": "date.latest",
"count": "long.count"
},
"groupBy": [
"startDate",
"endDate"
]
}
}
}
We have added a new type - 'union'. This is a com.yahoo.sketches.theta.Union object. We also added in the serialiser and aggregator for the Union object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new edges are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"date.earliest": {
"class": "java.util.Date",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
],
"aggregateFunction": {
"class": "uk.gov.gchq.koryphe.impl.binaryoperator.Min"
}
},
"date.latest": {
"class": "java.util.Date",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
],
"aggregateFunction": {
"class": "uk.gov.gchq.koryphe.impl.binaryoperator.Max"
}
},
"long.count": {
"class": "java.lang.Long",
"aggregateFunction": {
"class": "uk.gov.gchq.koryphe.impl.binaryoperator.Sum"
}
},
"union": {
"class": "com.yahoo.sketches.theta.Union",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.theta.binaryoperator.UnionAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.theta.serialisation.UnionSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
1000 different edges were added to the graph for the day 09/01/2017 (i.e. the startDate was the midnight at the start of the 9th, and the endDate was the midnight at the end of the 9th). For each edge, an Entity was created, with a vertex called "graph". This contained a Union object to which a string consisting of the source and destination was added. 500 edges were added to the graph for the day 10/01/2017. Of these, 250 were the same as edges that had been added in the previous day, but 250 were new. Again, for each edge, an Entity was created for the vertex called "graph".
Here is the Entity for the different days:
Entity[vertex=graph,group=size,properties=Properties[size=<com.yahoo.sketches.theta.UnionImpl>com.yahoo.sketches.theta.UnionImpl@1d75e7af,endDate=<java.util.Date>Tue Jan 10 00:00:00 GMT 2017,startDate=<java.util.Date>Mon Jan 09 00:00:00 GMT 2017]]
Entity[vertex=graph,group=size,properties=Properties[size=<com.yahoo.sketches.theta.UnionImpl>com.yahoo.sketches.theta.UnionImpl@34b27915,endDate=<java.util.Date>Wed Jan 11 00:00:00 GMT 2017,startDate=<java.util.Date>Tue Jan 10 00:00:00 GMT 2017]]
This is not very illuminating as this just shows the default toString()
method on the sketch. To get value from it we need to call a method on the Union object:
final GetAllElements getAllEntities2 = new GetAllElements.Builder()
.view(new View.Builder()
.entity("size")
.build())
.build();
final CloseableIterable<? extends Element> allEntities2 = graph.execute(getAllEntities2, user);
final CloseableIterator<? extends Element> it = allEntities2.iterator();
final Element entityDay1 = it.next();
final CompactSketch sketchDay1 = ((Union) entityDay1.getProperty("size")).getResult();
final Element entityDay2 = it.next();
final CompactSketch sketchDay2 = ((Union) entityDay2.getProperty("size")).getResult();
final double estimateDay1 = sketchDay1.getEstimate();
final double estimateDay2 = sketchDay2.getEstimate();
The result is:
1000.0
500.0
Now we can get an estimate for the number of edges in common across the two days:
final Intersection intersection = Sketches.setOperationBuilder().buildIntersection();
intersection.update(sketchDay1);
intersection.update(sketchDay2);
final double intersectionSizeEstimate = intersection.getResult().getEstimate();
The result is:
250.0
We now get an estimate for the number of edges in total across the two days, by simply aggregating overall the properties:
final GetAllElements getAllEntities = new GetAllElements.Builder()
.view(new View.Builder()
.entity("size", new ViewElementDefinition.Builder()
.groupBy() // set the group by properties to 'none'
.build())
.build())
.build();
final CloseableIterable<? extends Element> allEntities = graph.execute(getAllEntities, user);
final Element entity = allEntities.iterator().next();
final double unionSizeEstimate = ((Union) entity.getProperty("size")).getResult().getEstimate();
The result is:
1250.0
The code for this example is ReservoirItemsUnion.
This example demonstrates how the ReservoirItemsUnion
Properties class: com.yahoo.sketches.sampling.ReservoirItemsUnion
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new elements schema. The edge has a property called 'stringsSample'. This will store the ReservoirItemsUnion
{
"entities": {
"blueEntity": {
"vertex": "vertex.string",
"properties": {
"neighboursSample": "reservoir.strings.union"
}
}
},
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"stringsSample": "reservoir.strings.union"
}
},
"blue": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false"
}
}
}
We have added a new type - 'reservoir.strings.union'. This is a com.yahoo.sketches.sampling.ReservoirItemsUnion object. We also added in the serialiser and aggregator for the ReservoirItemsUnion object. Gaffer will automatically aggregate these sketches, using the provided aggregator, so they will keep up to date as new edges are added to the graph.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"reservoir.strings.union": {
"class": "com.yahoo.sketches.sampling.ReservoirItemsUnion",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.sampling.binaryoperator.ReservoirItemsUnionAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.sketches.datasketches.sampling.serialisation.ReservoirStringsUnionSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
An edge A-B of group "red" was added to the graph 1000 times. Each time it had the stringsSample property containing a randomly generated string. Here is the edge:
Edge[source=A,destination=B,directed=false,group=red,properties=Properties[stringsSample=<com.yahoo.sketches.sampling.ReservoirItemsUnion>
### ReservoirItemsUnion SUMMARY:
Max k: 20
Gadget summary:
### ReservoirItemsSketch SUMMARY:
k : 20
n : 1000
Current size : 20
Resize factor: X8
### END SKETCH SUMMARY
### END UNION SUMMARY
]]
This is not very illuminating as this just shows the default toString()
method on the sketch. To get value from it we need to call a method on the ReservoirItemsUnion object:
final GetElements query = new GetElements.Builder()
.input(new EdgeSeed("A", "B", DirectedType.UNDIRECTED))
.build();
final CloseableIterable<? extends Element> edges = graph.execute(query, user);
final Element edge = edges.iterator().next();
final ReservoirItemsSketch<String> stringsSketch = ((com.yahoo.sketches.sampling.ReservoirItemsUnion) edge.getProperty("stringsSample"))
.getResult();
final String[] samples = stringsSketch.getSamples();
final StringBuilder sb = new StringBuilder("10 samples: ");
for (int i = 0; i < 10 && i < samples.length; i++) {
if (i > 0) {
sb.append(", ");
}
sb.append(samples[i]);
}
The results contain a random sample of the strings added to the edge:
10 samples: BIBFBBIDCJ, JIACFBDHAH, DJJDEDAFDH, HEGGBJDBHG, FGJJDFEBAG, IHFIGAJHJI, BJICHHAFFE, JAIJDCFDHD, BJHBGHBGHH, ACHCDCJFGE
500 edges of group "blue" were also added to the graph (edges X-Y0, X-Y1, ..., X-Y499). For each of these edges, an Entity was created for both the source and destination. Each Entity contained a 'neighboursSample' property that contains the vertex at the other end of the edge. We now get the Entity for the vertex X and display the sample of its neighbours:
final GetElements query2 = new GetElements.Builder()
.input(new EntitySeed("X"))
.build();
final CloseableIterable<? extends Element> entities = graph.execute(query2, user);
final Element entity = entities.iterator().next();
final ReservoirItemsSketch<String> neighboursSketch = ((com.yahoo.sketches.sampling.ReservoirItemsUnion) entity.getProperty("neighboursSample"))
.getResult();
final String[] neighboursSample = neighboursSketch.getSamples();
sb.setLength(0);
sb.append("10 samples: ");
for (int i = 0; i < 10 && i < neighboursSample.length; i++) {
if (i > 0) {
sb.append(", ");
}
sb.append(neighboursSample[i]);
}
The results are:
10 samples: Y462, Y2, Y319, Y194, Y142, Y457, Y449, Y470, Y467, Y444
The code for this example is TimestampSet.
This example demonstrates how the TimestampSet property can be used to maintain a set of the timestamps at which an element was seen active. In this example we record the timestamps to minute level accuracy, i.e. the seconds are ignored.
Properties class: uk.gov.gchq.gaffer.time.RBMBackedTimestampSet
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new elements schema. The edge has a property called 'timestampSet'. This will store the TimestampSet object, which is actually a 'RBMBackedTimestampSet'.
{
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"timestampSet": "timestamp.set"
}
}
}
}
We have added a new type - 'timestamp.set'. This is a uk.gov.gchq.gaffer.time.RBMBackedTimestampSet object. We also added in the serialiser and aggregator for the RBMBackedTimestampSet object. Gaffer will automatically aggregate these sets together to maintain a set of all the times the element was active.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"timestamp.set": {
"class": "uk.gov.gchq.gaffer.time.RBMBackedTimestampSet",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.time.binaryoperator.RBMBackedTimestampSetAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.time.serialisation.RBMBackedTimestampSetSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
Only one edge is in the graph. This was added 25 times, and each time it had the 'timestampSet' property containing a randomly generated timestamp from 2017. Here is the Edge:
Edge[source=A,destination=B,directed=false,group=red,properties=Properties[timestampSet=<uk.gov.gchq.gaffer.time.RBMBackedTimestampSet>RBMBackedTimestampSet[timeBucket=MINUTE,timestamps=2017-01-08T07:29:00Z,2017-01-18T10:41:00Z,2017-01-19T01:36:00Z,2017-01-31T16:16:00Z,2017-02-02T08:06:00Z,2017-02-12T14:21:00Z,2017-02-15T22:01:00Z,2017-03-06T09:03:00Z,2017-03-21T18:09:00Z,2017-05-08T15:34:00Z,2017-05-10T19:39:00Z,2017-05-16T10:44:00Z,2017-05-23T10:02:00Z,2017-05-28T01:52:00Z,2017-06-24T23:50:00Z,2017-07-27T09:34:00Z,2017-08-05T02:11:00Z,2017-09-07T07:35:00Z,2017-10-01T12:52:00Z,2017-10-23T22:02:00Z,2017-10-27T04:12:00Z,2017-11-01T02:45:00Z,2017-12-11T16:38:00Z,2017-12-22T14:40:00Z,2017-12-24T08:00:00Z]]]
You can see the list of timestamps on the edge. We can also get just the earliest, latest and total number of timestamps using methods on the TimestampSet object to get the following results:
Edge A-B was first seen at 2017-01-08T07:29:00Z, last seen at 2017-12-24T08:00:00Z, and there were 25 timestamps it was active.
The code for this example is BoundedTimestampSet.
This example demonstrates how the BoundedTimestampSet property can be used to maintain a set of the timestamps at which an element was seen active. If this set becomes larger than a size specified by the user then a uniform random sample of the timestamps is maintained. In this example we record the timestamps to minute level accuracy, i.e. the seconds are ignored, and specify that at most 25 timestamps should be retained.
Properties class: uk.gov.gchq.gaffer.time.BoundedTimestampSet
Predicates:
Aggregators:
To Bytes Serialisers:
This is our new schema. The edge has a property called 'boundedTimestampSet'. This will store the BoundedTimestampSet object, which is actually a 'BoundedTimestampSet'.
{
"edges": {
"red": {
"source": "vertex.string",
"destination": "vertex.string",
"directed": "false",
"properties": {
"boundedTimestampSet": "bounded.timestamp.set"
}
}
}
}
We have added a new type - 'bounded.timestamp.set'. This is a uk.gov.gchq.gaffer.time.BoundedTimestampSet object. We have added in the serialiser and aggregator for the BoundedTimestampSet object. Gaffer will automatically aggregate these sets together to maintain a set of all the times the element was active. Once the size of the set becomes larger than 25 then a uniform random sample of size at most 25 of the timestamps is maintained.
{
"types": {
"vertex.string": {
"class": "java.lang.String",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.Exists"
}
]
},
"bounded.timestamp.set": {
"class": "uk.gov.gchq.gaffer.time.BoundedTimestampSet",
"aggregateFunction": {
"class": "uk.gov.gchq.gaffer.time.binaryoperator.BoundedTimestampSetAggregator"
},
"serialiser": {
"class": "uk.gov.gchq.gaffer.time.serialisation.BoundedTimestampSetSerialiser"
}
},
"false": {
"class": "java.lang.Boolean",
"validateFunctions": [
{
"class": "uk.gov.gchq.koryphe.impl.predicate.IsFalse"
}
]
}
}
}
There are two edges in the graph. Edge A-B was added 3 times, and each time it had the 'boundedTimestampSet' property containing a randomly generated timestamp from 2017. Edge A-C was added 1000 times, and each time it also had the 'boundedTimestampSet' property containing a randomly generated timestamp from 2017. Here are the edges:
Edge[source=A,destination=B,directed=false,group=red,properties=Properties[boundedTimestampSet=<uk.gov.gchq.gaffer.time.BoundedTimestampSet>BoundedTimestampSet[timeBucket=MINUTE,state=NOT_FULL,maxSize=25,timestamps=2017-02-12T14:21:00Z,2017-03-21T18:09:00Z,2017-12-24T08:00:00Z]]]
Edge[source=A,destination=C,directed=false,group=red,properties=Properties[boundedTimestampSet=<uk.gov.gchq.gaffer.time.BoundedTimestampSet>BoundedTimestampSet[timeBucket=MINUTE,state=SAMPLE,maxSize=25,timestamps=2017-03-12T05:27:00Z,2017-03-12T19:14:00Z,2017-03-20T06:52:00Z,2017-04-06T13:29:00Z,2017-04-20T15:20:00Z,2017-04-22T18:37:00Z,2017-04-28T23:45:00Z,2017-05-02T03:42:00Z,2017-05-25T04:20:00Z,2017-05-25T19:45:00Z,2017-06-22T17:04:00Z,2017-06-27T06:10:00Z,2017-07-20T20:25:00Z,2017-07-26T10:39:00Z,2017-08-01T18:58:00Z,2017-08-28T08:08:00Z,2017-09-01T01:42:00Z,2017-10-14T12:54:00Z,2017-11-13T03:42:00Z,2017-11-30T23:18:00Z,2017-12-01T08:23:00Z,2017-12-09T17:50:00Z,2017-12-10T17:37:00Z,2017-12-13T12:03:00Z,2017-12-26T12:14:00Z]]]
You can see that edge A-B has the full list of timestamps on the edge, but edge A-C has a sample of the timestamps.
Merged into develop.
We have serialisers and aggregators for several different java classes. These should be listed in the Properties Guide alongside some simple examples of how to use them.