AtlasOfLivingAustralia / ALA4R

Access data and resources hosted by the Atlas of Living Australia (ALA)
https://atlasoflivingaustralia.github.io/ALA4R/
42 stars 8 forks source link

wkt polygon validation gives false negatives when spaces follow commas #18

Closed shawnlaffan closed 8 years ago

shawnlaffan commented 8 years ago

The check_wkt::is_valid_wkt_polygon function needs to strip whitespace around commas.

The following wkts are identical, and as far as I can see valid, but the first one results in a warning of "WKT string appears to be invalid".

This is caused by the string match used in is_valid_wkt_polygon, which compares "154 -43.74" with " 154 -43.74".

wkt_text = "POLYGON((154 -43.74, 154 -9, 112.9 -9, 112.9 -43.74, 154 -43.74))"

wkt_text = "POLYGON((154 -43.74,154 -9,112.9 -9,112.9 -43.74,154 -43.74))"

I assume it is also affected by spaces before the comma.

The simplest solution would probably be to give strsplit a pattern like "\s,\s_".

Regards, Shawn.

raymondben commented 8 years ago

Thanks Shawn - actually there's a newish R package from @sckott for handling WKT strings, including a validator - https://github.com/ropensci/wellknown. I'll look at replacing ALA4R's home-rolled dodgy check_wkt with that.

shawnlaffan commented 8 years ago

No worries Ben. That saves me listing another issue regarding ring ordering checks.

Cheers, Shawn.

raymondben commented 8 years ago

Hmm, the wellknown package is an improvement, but not a total solution. It also thinks that whitespaces are invalid in some cases (e.g. before the comma). I'm not sure if they are valid or not, but it''s kind of a moot point because the ALA servers quite happily accept those WKT strings. So I've also modified the ALA4R functions that take WKT strings, so that WKT strings are NOT checked prior to submitting the request to the ALA. If the request fails (and if the server provides no diagnostic information) then the WKT string is checked and if it fails, a warning is issued that it might have been the problem. The occurrences function always behaved this way, but specieslist and sites_by_species did not. I think that's about the best we can do for the time being ... let me know if it doesn't seem a suitable solution (or if the ring ordering checks are causing problems).

(ALA4R v1.25 contains these changes)

shawnlaffan commented 8 years ago

Thanks for that. I'll sing out if problems persist.

sckott commented 8 years ago

@raymondben can you share some example wkt that you refer to that was invalid, and what fxn were you using in wellknown

raymondben commented 8 years ago

@sckott - sure, see https://github.com/ropensci/wellknown/issues/9

sckott commented 8 years ago

thanks

raymondben commented 8 years ago

ropensci/wellknown now handles the whitespaces, so our check_wkt::is_valid_wkt_polygon also handles them. Further wkt parsing inconsistencies - please notify!