I have a RepyV1 program running in a Seattle vessel that's monitoring which nodes announce keys in the advertise service, and contacts these nodes querying the number of available vessels from them. I have a few questions about the dataset which can be answered by a bit of data mining:
How many different IP addresses did we see overall?
How many of them were private ones?
How often did we see which node?
Which nodes disappeared over time?
What names are associated with these nodes? (Hint: use "dig -x" on the list of IP addresses)
What categories do these names fall into? (Hint: Compare with known !PlanetLab nodes, extract top-level domains, look for well-known ISPs/mobile operators, etc.)
Be sure to add questions you come up with yourself, too!
You might want to create graphical representations of your results. I suggest to use something scriptable such as gnuplot or Gnu R rather than Excel.
The lines of the logfile are formatted like this: (I happened to not check for "!PlanetLab status" due to performance reasons.)
. Node statistics for two different advertised keys
``nodestats'', advertise key name, timestamp, advertise type-count pairs,
``overall'' count of unique nodes across all advertise types
. Overall statistics on advertised keys, including vessel counts:
``SUMMARY'', advertise key name, timestamp, advertise type-count pairs,
``overall'' unique nodes count, ``contacted'' number of nodes contacted
for the purpose of counting vessels (including nodes that didn't currently
advertise), ``vessels'' total number of vessels
. Detailed vessel availability data
``vessels'', advertise key name, timestamp, and then tuples of
(IP, nodeman port, round-trip latency, vessel count, PlanetLab status)
for each node ever found advertising. Within tuples, fields are colon
separated, tuples themselves are separated by commas.
(Some nodes might not advertise anymore but be still contactable,
or the other way around. Let's see.)
. Details on the nodes that advertise:
``nodedetails'', advertise key name, timestamp, and then pairs of
IP address ``:'' nodemanager port, successive pairs being comma-separated.
I have a RepyV1 program running in a Seattle vessel that's monitoring which nodes announce keys in the advertise service, and contacts these nodes querying the number of available vessels from them. I have a few questions about the dataset which can be answered by a bit of data mining:
You might want to create graphical representations of your results. I suggest to use something scriptable such as gnuplot or Gnu R rather than Excel.
The lines of the logfile are formatted like this: (I happened to not check for "!PlanetLab status" due to performance reasons.)