TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

Data check for systemupdates.csv #77

Open michihdeu opened 6 years ago

michihdeu commented 6 years ago

The file should be checked for invalid entries. For instance, that the "systemName" column contains an existing path.

yakra commented 6 years ago

Would this issue be better suited to TravelMapping/Web instead? I think handling it in the PHP may be the way to go... There are many entries with systemNames that are no longer valid. Most of these are for systems merged into larger active systems, E.G. engb, irlr4, usaky8, etc. It makes sense to keep these around for historical reference purposes. Note that for a merged system, only the raw text is provided in the System Code column, without a link to the HB.

It makes sense IMO to detect entries where a system with a statusChange to preview later has a statusChange to merged, and similarly disable the HB link.

Or simpler yet, the HB link could simply be disabled for any systemName that is no longer valid. Simpler to code, and this can catch cases other than just preview->merged.

michihdeu commented 6 years ago

It's here because I wanna be able to check it before the site update is executed.

I generally think that all dead links should be eliminated. The entire entries (or minimum the systemName) could be removed. I don't think that a complex check is required nor reasonable.

jteresco commented 6 years ago

I'm definitely in favor of keeping the entries for historical interest, and agree that we should avoid invalid links. I'm also thinking it is better done on the PHP side to avoid bad links, but that the site update process (including the soon-to-exist highway data check version) should clearly report malformed entries.

yakra commented 6 years ago

Drilling down a bit farther, looking for invalid system codes besides merged cases...

1. cannb

asiahr and asiahp are also listed with a statusChange of split, but cannb is the only split system with a code that's no longer valid. It should be apparent to anyone reading updates.php that was split into cannba, cannbc and cannbl.

2. mnem2

I assume this was superseded by 2018-01-05;Montenegro;mnem;Montenegro Magistralni Put;re-entered

3. mkdm

Macedonia Magistrale are later listed with the mkda system code. Either the code was changed at some point, or mkdm was a typo or error.

4. serba

This looks like a typo for srba.

5. nldn0

Netherlands Niet-Autosnelwegen 1-99. It's not clear from looking at systemupdates.csv, or the commit history of systems.csv, what became of this system. I also see nldn1 thru nldn9 here. Nothing in systemupdates.csv says they ever hit preview; I've not yet looked forward in the commit history of systems.csv to investigate their disappearance.

6. hgkrt

typo for hkgrt

7. gbna1

gbna1 is listed as extended and merged on adjacent lines, both for 2015-12-30. This along with gbna6, gbna8 & gbna9 merged on the same date. The top entry, for Great Britain A Roads, should probably be for vanilla gbna instead?

michihdeu commented 6 years ago
  1. nldn0

It was changed to nldr

yakra commented 6 years ago

It was changed to nldr

On 2016-12-14?

michihdeu commented 6 years ago

yep! It was initially planned to have all N routes in one system but the 3-digit routes are maintained by the provinces. The other routes are maintained by the state.

@yakra I guess you wanna clean the file?

yakra commented 6 years ago

@yakra I guess you wanna clean the file?

A backdated note such as 2016-12-14;Netherlands;nldr;Netherlands Niet-Autosnelwegen 1-99;renamed could satisfy the historically curious. :)

As for the rest, *shrug* -- ping @si404?

michihdeu commented 6 years ago

The actual entries are:

2016-12-14;Netherlands;nldr;Netherlands Rijkswegen;active
2016-01-15;Netherlands;nldn0;Netherlands Niet-Autosnelwegen 1-99;preview

Option 1 (yakra's idea, note: it was actually 4 days earlier; and w/o link in first entry):

2016-12-14;Netherlands;nldr;Netherlands Rijkswegen;active
2016-12-10;Netherlands;nldr;Netherlands Niet-Autosnelwegen 1-99;renamed
2016-01-15;Netherlands;;Netherlands Niet-Autosnelwegen 1-99;preview

Option 2 (only changing the link):

2016-12-14;Netherlands;nldr;Netherlands Rijkswegen;active
2016-01-15;Netherlands;nldr;Netherlands Niet-Autosnelwegen 1-99;preview

Option 3 (option 1 but using new system name):

2016-12-14;Netherlands;nldr;Netherlands Rijkswegen;active
2016-12-10;Netherlands;nldr;Netherlands Rijkswegen;re-entered
2016-01-15;Netherlands;;Netherlands Niet-Autosnelwegen 1-99;preview

Option 4 (option 3 but with the new system link in first entry):

2016-12-14;Netherlands;nldr;Netherlands Rijkswegen;active
2016-12-10;Netherlands;nldr;Netherlands Rijkswegen;re-entered
2016-01-15;Netherlands;nldr;Netherlands Niet-Autosnelwegen 1-99;preview

I don't have a favorite. Option 2, 3 or 4 are fine. It should be similar to the other changes required (especially w/ or w/o new system link).

yakra commented 6 years ago

A backdated note such as 2016-12-14;Netherlands;nldr;Netherlands Niet-Autosnelwegen 1-99;renamed could satisfy the historically curious. :)

I mistyped that. :( I meant to say: 2016-12-14;Netherlands;nldn0;Netherlands Niet-Autosnelwegen 1-99;renamed

Does that change your options?

michihdeu commented 6 years ago

nldn0 is a dead link. That's exactly what I don't like!

jteresco commented 6 years ago

I'd deal with dead links in the Web code, so I'd prefer to keep them in the CSV, again for the vague historical interest.

yakra commented 5 years ago

See also: https://github.com/TravelMapping/Web/issues/110 Perhaps the most fool-proof & future-proof solution is to query the DB to see whether a system with a given code exists.

After @michihdeu wrote

System codes of "extended" and "split" systems remain existing though and the link should work.

I added the line 2017-08-21;(Canada) New Brunswick;cannb;New Brunswick Provincial Highways;split ...so now we have both valid and invalid system codes in the systemName column. (Not sure what split means in the Asia cases; I can't figure out what was happening there based on other entries on the same date. ping @si404?)

As for re-entered systems, what does that mean and how/why is it relevant to the updates page?

si404 commented 5 years ago

split in the Asian cases was me splitting off routes into ones that were done (with different levels of 'done' - active, review: a subset to try and make reviewing what's being asked to review relatively small, rather than 100s of 1000s of km - a bit like how eure was activated one region at a time, preview: other routes done), and ones that were just bare-bones placeholder files (now just Sumatra and the Philippines IIRC).

re-entered is basically a total redo of the whole system. I guess the Spain and German entries date from when we split the regions (IIRC, I put an updates.csv entry for Spain, but @michihdeu did this for Germany and then we did what the other one did as well, having seen it). We would have done it for Bosnia had they gone ahead with renumbering everything, for example.

yakra commented 5 years ago

split in the Asian cases was me splitting off routes into ones that were done (with different levels of 'done' - active, review:

Yeah, ISTR this. Looking thru systemupdates.csv some more, I see asiah activated in various countries at different dates, meaning it didn't all happen at once. I assume we had chunks of routes, for each country, split off of asiahp & asiahr, similar to usansf->usaxx*, cansph->cannl etc., or a bit differently, the future usasf->usanyp move.

*What's the difference between asiahp & asiahr? asiahr also has Association of Southeast Asian Nations in the mix. OK, more southeastern countries here; makes sense. Though it throws me off seeing Chinese regions in both systems. Assuming (for review) & (for preview) mean basically the same thing.

and ones that were just bare-bones placeholder files (now just Sumatra and the Philippines IIRC).

Yep, I see those still here. IDN, MMR, PHL, VNM.

re-entered is basically a total redo of the whole system. I guess the Spain and German entries date from when we split the regions

The dates match the updates.csv entries: 2017-10-06;Germany;All routes;;All routes split into 16 regions 2017-10-06;Spain;All routes;;All routes split into 19 regions


It's difficult sometimes to sum up what's going on with these systems in a single word.

michihdeu commented 5 years ago

r = for review p = for preview

systems.csv

yakra commented 5 years ago

Why 2 systems though? I'd think that a Preview system would by definition be ready for peer Review.

Or am I off base in assuming such a definition? Is it that some routes are far enough along in development to be in Preview, but not far enough along to be ready for Review yet?

si404 commented 5 years ago

It was impossible enough to get some review without the system being 100,000km. Staggered review was meant to reduce the barrier for potential reviewers.

yakra commented 5 years ago

Gotcha. A GBNA / USAKY situation.

michihdeu commented 5 years ago

I think that we should focus on the regions with most travelers first - North America and Europe. Once their systems will be active (except of scenic/tourist routes), we can expand to APA, Africa and South America and work on activating them. Especially China will be a huge challenge.

jteresco commented 1 year ago

This one's sat here for a while. Any need to do anything here?

michihdeu commented 1 year ago

I think that it's still a valid issue.