everypolitician / compare_with_wikidata

Library for diffing Wikidata and CSVs
MIT License
2 stars 0 forks source link

Fix expansion of Wikidata IDs #94

Closed lucychambers closed 7 years ago

lucychambers commented 7 years ago

Change to prompts was to stop the expansion of Wikidata IDs to links being so greedy.

Unfortunately it is now insufficiently greedy

https://www.wikidata.org/w/index.php?title=User:Oravrattas/prompts/Seat_Count/comparison&oldid=552803617

previously Q3510833->Q1572486 would have expanded each of those

https://www.wikidata.org/w/index.php?title=User:Oravrattas/prompts/Scottish_Parliament/comparison&oldid=552804078 is an example of it impacting the distinct CSV/SPARQL versions

dracos commented 7 years ago

Changing value = v.to_s.sub('http://www.wikidata.org/entity/', '').sub(/^Q(\d+)$/, '{{Q|\\1}}') to value = v.to_s.sub('http://www.wikidata.org/entity/', '').gsub(/\bQ(\d+)\b/, '{{Q|\\1}}') (similar to as it was before but with added \b) would match any Qnnn as long as it wasn't within a longer string (as was the issue with https://github.com/everypolitician/compare_with_wikidata/issues/92). It would still "break" a pathological string such as "The ship's ID was NCC.Q42.123" but presumably that is far less likely.