Closed yakra closed 5 years ago
On BiggaTomato, this takes only 4.7s (3%) longer than the code in #201.
MEX-QROO-region-traveled.tmg has 5 travelers:
TMG 2.0 traveled
3 2
MEX180D@MEX307 21.017955 -86.854477
MEX180D@YUC/QROO 20.878541 -87.608242
MEX180D@MEX180 21.097814 -86.975434
2 0 MEX180D 30
2 1 MEX180D F1 20.97242 -87.208314 20.950859 -87.293158 20.897907 -87.441688
epzik8 griffith ngrier sneezy ua747sp
MEX-CAM-region-traveled.tmg has 1 traveler:
TMG 2.0 traveled
3 2
MEX180D@MEX180_W 19.509962 -90.701966
MEX180D@MEX180_E 19.797536 -90.527494
MEX180D@CAM60 19.636525 -90.671883
2 1 MEX180D 1 19.705627 -90.627036
0 2 MEX180D 1
ua747sp
GNQ-region-traveled.tmg has no travelers:
TMG 2.0 traveled
6 4
N5Med/N5@GAB/GNQ 1.002709 10.779648
N5Med/N5Kom@GNQ/GAB 1.002709 10.762997
N5@Efo 1.009875 10.902643
N5/N5Kom@GAB/GNQ 1.002709 10.725231
N5@Med 1.017255 10.797157
N5Sam/N5@GNQ/GAB 1.00288 10.907192
2 5 N5
3 1 N5Kom
4 2 N5 1.025665 10.859213
0 4 N5
In this last example, note the absence of travelers listed at the end, and the extra space between the label and the shaping points, where the clinchedby_code would otherwise go. Something to watch out for when writing a routine that reads these files. @jteresco, thoughts?
Does it make sense to record a 0 there when there are no travelers, just to simplify reading?
Bear with me; thinking this thru out loud...
4 2 N5 1.025665 10.859213
.split()
ing the line yields 6 fields, the 4th of which has zero length. SFSG...strtok
, as that will result in 5 fields. We'll have to find a different solution.0
at first made me bristle a bit, as that to me suggests 1-4 travelers.
But is it really so bad? We're already dealing with cases where there are fewer travelers than the array can store (1/4 in MEX-CAM; 5/8 in MEX-QROO).TMG 2.0 traveled
6 4 0
To just give a straight answer to your question:
Does it make sense to record a 0 there when there are no travelers, just to simplify reading?
Yes.
This would simplify reading, and as such, makes sense. It's effectively just another normal case of extra bits at the end of the array that aren't used by any traveler.
I almost went & implemented the single-0
hex code for zero-traveler graphs, but had one thought...
Normally, a single 0
would mean one to four travelers. The single line at the end of the file lists who they are, and thus gives us the actual number of travelers:
ABW-region-traveled.tmg:
bejacob dharwood froggie kjslaughter
AZE-region-traveled.tmg:
ua747sp
Right now, I'm not writing a terminating line feed after the list of traveler names. Meaning, in cases where there are no travelers, there's one single line feed after the list of edges. Thus many text editors would interpret the final edge as being the final line. Seems a Good Thing would be to add in that one final line feed; that way, all traveled graphs would still contain the "line" listing all the travelers, even if it's a blank one listing nobody. @jteresco, thoughts as to implementation, for HDX and/or whatever else?
I'd like to keep at least the blank line there. There's also minimal cost and a small potential benefit for those parsing the file to start the list of travelers with a traveler count. I think there's also a small potential benefit for that number of users and maybe even the user list to come before the edges that contain that info. The main downside to me is that it is a more significant deviation from the previous tmg file formats.
I'd like to keep at least the blank line there.
Agreed. Let's definitely do that. (Here, I'm interpreting the word "there" to mean "in the file", though not necessarily any specific place therein yet.)
There's also minimal cost and a small potential benefit for those parsing the file to start the list of travelers with a traveler count.
Not strictly necessary as I see it; when we encounter a string of n hex characters, we know we need an array of 4*n travelers. But I agree; minimal cost and a small potential benefit. Not knowing the number of travelers in advance, we'd have to do things a bit differently when parsing the first line, complicates things a bit. Having the number of travelers would simplify.
I think there's also a small potential benefit for that number of users and maybe even the user list to come before the edges that contain that info. The main downside to me is that it is a more significant deviation from the previous tmg file formats.
How about recording the number of travelers in the second header line as noted above, after the number of vertices & edges? (Minor drawback here is slightly different files from what you & your students already have. Existing code to read traveled TMGs may need to be changed.) The user list itself could stay at the end. IMO this makes parsing the file more convenient, while minimizing the deviation from the previous tmg file formats. The lines for vertices and edges would still be in the same place relative to the beginning of the file.
Shall we still keep the single-0
hex code for zero-traveler graphs, to simplify reading the edge lines?
Otherwise, we'd have to account for two spaces between the edge label and shaping points (or a trailing space on lines with no shapers), which could be an annoyance.
The 3 steps I see doing to get this going:
~Definitely do~ DONE:
Most likely do:
0
hex code for each edge in zero-traveler graphs.
Arguably a more clear way to say "The hex code here actually stores no data; just ignore it" than having two whitespace characters between the edge name & shaping points.To do or not to do?:
Another idea that just occurred to me: Bypass writing a traveled graph altogether in cases where the graph has no travelers. I'm not sure how much I like this, though...
It's probably best for consistency & simplicity's sake to include all three formats for every graph, even in cases where there are no travelers.
single 0
hex codes and # of travelers in header line implemented.
Python: https://github.com/yakra/DataProcessing/pull/84
C++: 652efe3135ff6e65520a05c371d1a4dcce339c90
A side topic to #196:
@jteresco noted that many of our graphs will have lots of segments with very few travelers, meaning many
0
digits, and floated the idea of saving space in traveled graph files. https://github.com/TravelMapping/DataProcessing/issues/196#issuecomment-472569800I considered run-length encoding of
0
s, ultimately dropping the idea on simplicity grounds. https://github.com/TravelMapping/DataProcessing/issues/196#issuecomment-473155091An idea that will help out in subgraphs is to only record travelers who've clinched at least one segment contained in the graph. For example:
clinchedby_code
, instead of 60.clinchedby_code
, instead of 60.Advantages: Requires no changes to the
TMG 2.0 traveled
"specification", keeping the hexclinchedby_code
encoded simply & intuitively.Limitations: The more travelers we have in a dataset, the smaller the savings, all the way up to tm-master-traveled.tmg, with no savings at all (assuming each traveler has at least one segment traveled).
Disadvantages: Using the full traveler list for all graphs, we can potentially save time (how much?) by generating
clinchedby_code
s during graph setup, and looking them up, instead of regenerating, for all subgraphs. With subgraphs using different traveler lists and thus differentclinchedby_code
s for each segment, this is no longer an option.Implementation: We'd have to reassign traveler numbers for each graph.
TravelerList.traveler_num
.TravelerList
->traveler_num
hash table for each graph, orTravelerList::traveler_num
as an array ofnumthreads
elements, and reassign for each graph.