Open michihdeu opened 4 years ago
Most of the DL errors are hidden wps. They could be used by experts but not by normal users. And we should make live as easy as possible for normal users.
Having trouble following you; what do you mean by "normal users"? When I hear the phrase, I think, regular users of the site; travelers. But this post is about streamlining the datacheck page, which is used by contributors -- people who are by definition the "experts" who can find & use hidden WPs. Otherwise, I don't see how any of this would benefit regular travelers, not site contributors. That quibble aside, though...
I've reported the relevant open DL issues today, and there are really just 3 wps.
Huh? When I filter out "X plus numeral" cases I see 15:
[yakra@noreaster /home/www/tm/logs]$ cat datacheck.log | grep DUPLICATE_LABEL | grep -v '^.*;[Xx][0-9].*;;;DUPLICATE_LABEL;$'
cod.n002;kan;;;DUPLICATE_LABEL;
cod.n004;ben;;;DUPLICATE_LABEL;
cod.n006gem;bod;;;DUPLICATE_LABEL;
cod.n027;s436;;;DUPLICATE_LABEL;
cod.tah8;ben;;;DUPLICATE_LABEL;
egy.m075;h21;;;DUPLICATE_LABEL;
egy.tah1;h01_e;;;DUPLICATE_LABEL;
gab.tah10;kan;;;DUPLICATE_LABEL;
irq.m005;h8;;;DUPLICATE_LABEL;
mar.tah1;n8;;;DUPLICATE_LABEL;
nga.e001;a1;;;DUPLICATE_LABEL;
nga.e001;a1;;;DUPLICATE_LABEL;
nga.e001;a1;;;DUPLICATE_LABEL;
nga.e001;a1;;;DUPLICATE_LABEL;
nga.e001;a1;;;DUPLICATE_LABEL;
The problem is, that there is a huge number of data errros and to see the most important one's, we use the red font color. But the mass of hidden wps is also in red and I don't wanna trigger our hwy data managers for these minor issues....
Well, do we really use red to denote the most important ones?
While datacheck.php does say "Errors shown in red should be fixed as soon as possible", in fact everything is red, with just 3 exceptions:
https://github.com/TravelMapping/Web/blob/93b0b86d5d4cf770db719adeb398f70a039a8655/devel/datacheck.php#L60-L63
Thus I consider it more of a "the least important items are in black" scenario.
These are cases that are likely to not have a fix available, and ultimately just get crossed off as FP as the system moves to active
status.
Suggestion: Can we show DL errors starting with "X plus numberal" in black instead of red to make the real red errors more visible?
I don't necessarily think this is a bad idea, I just want to make sure something like this would be well thought through. What about all the other red errors - Items that should be fixed, but don't cause .list parsing errors or errors in people's stats? Such as label errors?
I think what the issue boils down to is that for some of us, such as @michihdeu & myself, addressing datacheck errors is a high priority. We strive to keep our regions & routes free of them, and take care of them quickly after they show up. For others, addressing datacheck errors is simply not a priority. And at the end of the day, no matter how much prodding we provide or how much we try to streamline the process, it won't change that. The majority of the DUPLICATE_LABEL cases date to 2016... :(
The only real relation to this issue is...
That one's a proposal for a siteupdate speed increase when Processing traveler list files. The only real relation to this issue is via a reference to # 275. :P ...Or did you mean to reference https://github.com/TravelMapping/DataProcessing/issues/272#issuecomment-566727336, for its discussion on how duplicate labels get mangled by the .list line parser?
Maybe it's BS and just the wrong direction to deal with this issue but... feel free to find a better way and... feel free to close this issue...
I'm not against your proposal per se... though skeptical. I think what this really comes down to is that some managers just don't care too much about cleaning out their datacheck items. Which we've complained about before. :)
Barring a surefire way to change that, the rest of us can meanwhile improve the signal-to-noise ratio for ourselves by using various tools at our disposal, for example
Having trouble following you; what do you mean by "normal users"?
Travelers who do not use wpt editor (or HDX) where you see +X wps.
I wanna say that +X wps are only "known" by hwy data managers (or "wpt editor users"). They are aware of +X labels and they could use them in their list files. Normal users not.
You mentioned on the forum that it is sometimes necessary to use X492394 labels in list files. I don't agree. If their is a point request for route A1 but we have no name for it, we could just name it A1_A, the next A1_B etc. But if it is a point in use (the point, not the name), it should be visible in HB. We might also introduce something like Y123456 to distinguish hidden and unnamed labels but I don't think that we really need nor want it.
How many (and which) hidden wps are currently in use by travelers? I could make suggestions for visible wp labels.
I've reported the relevant open DL issues today, and there are really just 3 wps. Huh? When I filter out "X plus numeral" cases I see 15:
Sorry, I meant 3 wps in active and preview systems. Do you really care about data errors for devel systems? There is such a mess.... not worth to look at and not worth to care about since it's in devel per definition.
Thus I consider it more of a "the least important items are in black" scenario.
wording... DL errors for hidden wps are "least important items" to me because they are totally irrelevant - with the exception mentioned above where they are really used in list files and backend usage for marking other data errors FP.
What about all the other red errors - Items that should be fixed, but don't cause .list parsing errors or errors in people's stats? Such as label errors?
I think that errors which are relevant to normal users (travelers who use HB only but not wpt editor nor HDX) should be red. Everything which can cause a broken list file entry, falsify stats or complicate navigating through the routes like NMPs. When label names change, they are relevant but since we have alt labels - yes, they are also less important.
I think what the issue boils down to is that for some of us, such as @michihdeu & myself, addressing datacheck errors is a high priority. We strive to keep our regions & routes free of them, and take care of them quickly after they show up. For others, addressing datacheck errors is simply not a priority.
Exactly! Because I don't wanna bother normal users.
I guess that due to the mass of data errors, some hwy data manager just think it's unimportant because there are also so many other errors....
When I open data check I'd like to see an empty list for active systems. If there are one or two errors, I could remember how long they are there and trigger the hwy data manager on the forum and ask for a fix. How many years are the OR errors there?
It would be ridiculous to open a thread "DL error +X01" please rename it to +X99 or whatever.... and getting a long discussion that it's just a concurrent segment and it's called +X01 on the other routes.... no....
And at the end of the day, no matter how much prodding we provide or how much we try to streamline the process, it won't change that.
If there are only very few red errors for active systems, we can keep an eye on it and report it on the forum. That's the difference. Maybe we could educate them to check it by themselves (but I doubt)
Again, in the end (when the exception with usage in list files would be eliminated), hidden wps are just relevant for marking other data errors FP.
I just mentioned https://github.com/TravelMapping/DataProcessing/issues/275 and https://github.com/TravelMapping/DataProcessing/issues/278 to get the reference there because there was some similar discussion (but not that similar, never mind)
I'll try to stay on topic & avoid replying to the earlier bits. :)
I think that errors which are relevant to normal users (travelers who use HB only but not wpt editor nor HDX) should be red. Everything which can cause a broken list file entry, falsify stats or complicate navigating through the routes like NMPs. When label names change, they are relevant but since we have alt labels - yes, they are also less important.
With the current red/black dividing line at VISIBLE_DISTANCE, LONG_SEGMENT, and SHARP_ANGLE, it seems the criterion for black is roughly "A lot of these are likely to just be marked FP once the system goes active." This does seem a more useful way to sort & show info prioritizing errors.
To break it all down by error type (thinking of "navigating" primarily as using the "Intersecting/Concurrent Routes" feature):
error code | proposed new color |
broken list file entry |
falsify stats |
complicate navigating |
comments |
---|---|---|---|---|---|
BAD_ANGLE | Red | no | yes | yes | A subset of DUPLICATE_COORDS. |
BUS_WITH_I | Black | no | no | no | |
DUPLICATE_COORDS | Red | no | yes | yes | Can falsify/complicate in true positive cases. |
DUPLICATE_LABEL | Red visible Black hidden |
yes | yes | no | Hidden points are arguably less important: rare potential for use, by power users only. |
HIDDEN_JUNCTION | Red | no | yes | yes | Broken concurrencies can falsify stats.Even FPs can potentially complicate navigating. |
HIDDEN_TERMINUS | Red | sort of | yes | yes | Prevents getting a proper list entry for fully clinched route. |
INVALID_FINAL_CHAR | Black | no | no | no | |
INVALID_FIRST_CHAR | Black | no | no | no | |
LABEL_INVALID_CHAR | Red | sort of | no | no | “Breaks” lists inasmuch as we’d wanna discourage non-ascii characters.I guess make it red & encourage fixing these ASAP before anything gets used in a .list? |
LABEL_LOOKS_HIDDEN | Black | no | no | no | |
LABEL_PARENS | Black | no | no | no | |
LABEL_SELFREF | Black | no | no | no | |
LABEL_SLASHES | Black | no | no | no | |
LABEL_UNDERSCORES | Black | no | no | no | |
LACKS_GENERIC | Black | no | no | no | |
LONG_SEGMENT | Black | no | no | no | |
LONG_UNDERSCORE | Black | no | no | no | |
MALFORMED_LAT MALFORMED_LON MALFORMED_URL |
Red | sort of | yes | yes | Can break lists if a waypoint is OK in an earlier version of the file, gets used in a list, and then is edited to have a malformed URL in a later version of the file. |
NONTERMINAL_UNDERSCORE | Black | no | no | no | |
OUT_OF_BOUNDS | Red | no | yes | yes | |
SHARP_ANGLE | Red | no | yes | no | This is the one currently black error type that would become red. |
US_BANNER | Black | no | no | no | |
VISIBLE_DISTANCE | Black | no | no | no | |
VISIBLE_HIDDEN_COLOC | Black | no | no | sort of | "One-way navigation” is possible, though arguably the desired effect when FP. |
How many years are the OR errors there?
What do you mean by "OR errors"?
To break it all down by error type
Your proposal should be fine. What will change (I think I've missed some very specific error types currently not on datacheck.php):
Red -> Black: DUPLICATE_LABEL (partially), INVALID_FINAL_CHAR, INVALID_FIRST_CHAR, LABEL_LOOKS_HIDDEN, LABEL_SELFREF, LABEL_SLASHES, LONG_UNDERSCORE and VISIBLE_HIDDEN_COLOC
Black -> Red: SHARP_ANGLE.
We could apply the very same rules to WPT editor. We could stick indicating the red errors in red and indicate the black errors in a different color (e.g. orange). If so, I agree with SA being red. If not, I'm not sure........
How many years are the OR errors there?
What do you mean by "OR errors"?
Oh, duh. :) I won't log back in to noreaster and run that shell script again just now, but the OR errors here date to between 2017-02-04 & 2019-03-11.
2019-03-11 or.or018;+x1(OR233);;;HIDDEN_JUNCTION;3
a1799ebaa91 (Jim Teresco 2019-03-11 13:39:27 -0400 48) +x1(OR233) http://www.openstreetmap.org/?lat=45.233891&lon=-123.064964
2017-02-04 or.or019;+x8(OR208);;;HIDDEN_JUNCTION;3
13073174191 (Jim Teresco 2017-02-04 16:07:47 -0500 38) +x8(OR208) http://www.openstreetmap.org/?lat=44.809944&lon=-119.907531
2017-02-04 or.or022;+x1(OR99EBus);;;HIDDEN_JUNCTION;3
13073174191 (Jim Teresco 2017-02-04 16:07:47 -0500 47) +x1(OR99EBus) http://www.openstreetmap.org/?lat=44.940182&lon=-123.042412
2018-11-26 or.or039kla;OR39;;;LABEL_SELFREF;
e5407b1af73 hwy_data/OR/usaor/or.or039kla.wpt (Jim Teresco 2018-11-26 21:38:55 -0500 8) OR39 http://www.openstreetmap.org/?lat=42.206508&lon=-121.736744
2019-03-11 or.or099;+x39;I-5(188A);+x33(I-5);SHARP_ANGLE;148.81
a1799ebaa91 (Jim Teresco 2019-03-11 13:39:27 -0400 218) +x39 http://www.openstreetmap.org/?lat=43.997607&lon=-123.009818
2019-03-11 or.or207;+X751840;+X592203;+X989432;SHARP_ANGLE;174.50
a1799ebaa91 (Jim Teresco 2019-03-11 13:39:27 -0400 30) +X751840 http://www.openstreetmap.org/?lat=44.948763&lon=-119.702268
Black -> Black: LONG_SEGMENT, VISIBLE_DISTANCE
Red -> Black: BUS_WITH_I, INVALID_FINAL_CHAR, INVALID_FIRST_CHAR, LABEL_LOOKS_HIDDEN, LABEL_PARENS, LABEL_SELFREF, LABEL_SLASHES, LABEL_UNDERSCORES, LACKS_GENERIC, LONG_UNDERSCORE, NONTERMINAL_UNDERSCORE, US_BANNER, VISIBLE_HIDDEN_COLOC
Black -> Red: SHARP_ANGLE.
Red -> Red: BAD_ANGLE, DUPLICATE_COORDS, HIDDEN_JUNCTION, HIDDEN_TERMINUS, LABEL_INVALID_CHAR, MALFORMED_LAT MALFORMED_LON MALFORMED_URL, OUT_OF_BOUNDS
Partial / Conditional: DUPLICATE_LABEL red if visible, black if hidden
Hmmm... SHARP_ANGLE and BAD_ANGLE... yes, should be treated the same way (red). Do we still need to distinguish them when they are both of the same category? I don't get what it means.
But the categories (or the changes) are all fine to me 👍
BAD_ANGLE is when two successive points have duplicate coords, and thus the angle can't be calculated, because division by zero.
ok, got it. Thx!
I think it's just this IF instruction:
https://github.com/TravelMapping/Web/blob/master/devel_new/datacheck.php#L60
I'll have a try....
Partial / Conditional: DUPLICATE_LABEL red if visible, black if hidden
I don't know how to implement this since the +
is already removed beforehand and X
is a valid character. Is there any flag (in DB?) that a wp is hidden?
In addition, I'm not familar with the programming language. How the table is filled / how to deal with variables.
The new DISCONNECTED_ROUTE
data check is missing on the list.
Since the 6-field multi-region user list file entries are effected, I think it is a critical error and should be output in red. No additional change to datacheck.php
required.
Partial / Conditional: DUPLICATE_LABEL red if visible, black if hidden
I don't know how to implement this since the
+
is already removed beforehand andX
is a valid character. Is there any flag (in DB?) that a wp is hidden?
Yikes! How did I not think of this earlier?...
There is no flag in the DB to indicate a hidden point.
We have to ignore leading +
s when checking for duplicates of course, because .list processing ignores them. Not a big deal in & of itself; I could easily have siteupdate retain the +
after making the comparison.
...but AltLabels get a bit tricky...
Since the goal here is to determine whether a point is visible/hidden overall...
+
or not.+
. SFSG...+
at the beginning -- whether the label itself does or not. And AltLabels can lack them;
+PlusPriLbl NoPlusAlt http://www.openstreetmap.org/?lat=42.719450&lon=-73.752063
is a valid .wpt line. Adding a +
onto the beginning would be easy, but is it the right thing to do?The part of me that likes precision bristles at this; it's a bit klugey. I can see someone seeing +NoPlusAlt
listed on datacheck.php, searching for that string in the .wpt file & not finding it, and getting confused or shrugging and moving on.
Re LABEL_INVALID_CHAR cases, I wrote:
What would be the more useful format for flagging these?...
Listing the primary label under
Waypoints
, with the relevant label underInfo
?me.us001;NH/ME;;;LABEL_INVALID_CHAR;+Foo#Bar
Or, listing the relevant label under
Waypoints
, with nothing underInfo
?me.us001;+Foo#Bar;;;LABEL_INVALID_CHAR;
@jteresco responded:
I prefer the offending alt label in the second field. Keep it simple. ... Option 2, the one that doesn't have the primary label included at all.
...so I went with that option.
Using an option like the 1st one for DUPLICATE_LABEL cases can distinguish visible from hidden points, while avoiding the pitfall of listing a string on the datacheck page that can't be found in the .wpt file.
A side benefit is being able to lookup point names (AltLabels are not in the DB) for HB links on datacheck.php
I think this all might be too much detail though, too much of a chase for perfection. Users still can, and do, use hidden points in their travels, so hidden points should still be fixed. IMO it's wrongheaded to deprioritize them.
@michihdeu wrote:
I wanna say that +X wps are only "known" by hwy data managers (or "wpt editor users"). They are aware of +X labels and they could use them in their list files. Normal users not.
A few counterexamples to this argument: bogdymol.list#L1023 johninkingwood.list#L2545 rlee.list#L1682
http://travelmapping.net/devel/datacheck.php?show=DUPLICATE_LABEL
I think that DUPLICATE LABEL errors are the most critical data errors because users cannot use these labels or their stats are falsified.
Most of the DL errors are hidden wps. They could be used by experts but not by normal users. And we should make live as easy as possible for normal users.
I've reported the relevant open DL issues today, and there are really just 3 wps.
The problem is, that there is a huge number of data errros and to see the most important one's, we use the red font color. But the mass of hidden wps is also in red and I don't wanna trigger our hwy data managers for these minor issues....
Suggestion: Can we show DL errors starting with "X plus numberal" in black instead of red to make the real red errors more visible?
https://github.com/TravelMapping/DataProcessing/issues/275 https://github.com/TravelMapping/DataProcessing/issues/278
Maybe it's BS and just the wrong direction to deal with this issue but... feel free to find a better way and... feel free to close this issue...