ekansa / gap2

Geographic Annotation Platform
4 stars 4 forks source link

zero-reference places (priority: hestia critical) #26

Open atomrab opened 10 years ago

atomrab commented 10 years ago

In cleaning up place data, I have started to tackle the places listed as having "zero references" at the bottom of the frequency histogram. Babylon is a good example: it is connected to the correct Pleiades URI (http://pleiades.stoa.org/places/893951), but its lat/lon coordinates appear as 0/0. There are lots of tokens in the text for this word (see http://gap2.alexandriaarchive.org/report/token-issues/37053), so I suspect that the zero-reference problem is connected with the lack of coordinates. References to Babylon are also missing from the timeline, which I assume is also related to the map's inability to display 0/0 coordinates.

The JSON representation has representative coordinates, so these exist -- they are just not being pulled through somehow. I can't update the site with the Pleiades API update button, because for some reason that doesn't appear in the report-issues page for Babylon. I tried putting in the URI again, but this didn't help either.

If we can solve this, we might be able to solve the other 20 zero-reference sites, some of them fairly important (e.g. Thrace, http://gap2.alexandriaarchive.org/report/token-issues/32250). I'm happy to do any grunt work involved here, but I'm stuck -- the next step has to be programmatic.

katefbyrne commented 10 years ago

Chipping in again; again possibly irrelevantly...:-)

The data I sent contains 42 entries for Babylon-893951, all with lat/long coordinates 32.5/44.5.

Kate

On 01/22/14 12:28, atomrab wrote:

In cleaning up place data, I have started to tackle the places listed as having "zero references" at the bottom of the frequency histogram. Babylon is a good example: it is connected to the correct Pleiades URI (http://pleiades.stoa.org/places/893951), but its lat/lon coordinates appear as 0/0. There are lots of tokens in the text for this word (see http://gap2.alexandriaarchive.org/report/token-issues/37053), so I suspect that the zero-reference problem is connected with the lack of coordinates. References to Babylon are also missing from the timeline, which I assume is also related to the map's inability to display 0/0 coordinates.

The JSON representation has representative coordinates, so these exist -- they are just not being pulled through somehow. I can't update the site with the Pleiades API update button, because for some reason that doesn't appear in the report-issues page for Babylon. I tried putting in the URI again, but this didn't help either.

If we can solve this, we might be able to solve the other 20 zero-reference sites, some of them fairly important (e.g. Thrace, http://gap2.alexandriaarchive.org/report/token-issues/32250). I'm happy to do any grunt work involved here, but I'm stuck -- the next step has to be programmatic.

— Reply to this email directly or view it on GitHub https://github.com/ekansa/gap2/issues/26.

Kate Byrne School of Informatics, University of Edinburgh http://homepages.inf.ed.ac.uk/kbyrne3/ location: http://geohash.org/gcvwr2rkb5hd twitter: @katefbyrne

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

atomrab commented 10 years ago

I still don't know why Babylon and Thrace came in as unlocated (the Pleiades+ script?), but I've fixed those two without difficulty.

Now that I've finished hand-URI'ing all of the "Unknown Place Label" sites in this body of text, however, I am sure that I was right: the zero-reference places are those that have a Pleiades URI that comes without coordinates (all of the Barrington Atlas "unlocated" places). Apparently using lat 0/lon 0 is not enough to allow the histogram to display them.

This is an issue we should try to deal with if the goal of Herodotus, at least, in GapVis is the association of toponyms with Pleiades URIs. There are lots of Pleiades URIs for places that existed in antiquity but now cannot be located. We'll want to figure out how to provide reference counts for these places in both the histogram and the place page, not least so that the user can jump to those tokens easily (plus having 20-odd sites listed as having zero references at the bottom of the book summary page is inelegant).

I don't know who's equipped to deal with this question, now or eventually (it's not a big deal for my current class). This is presumably an issue with the javascript that runs the interface. If GapVis is going to start pulling in more works, I see two desiderata: one, a way to tally unlocated place references (with URIs); and two -- since one sometimes finds associations between ancient and modern places that are not (yet) reflected in Pleiades, e.g. in the Landmark Herodotus -- a way to use other URIs, like GeoNames, on a custom basis, to assert a text-specific claim for a modern identification that Pleiades/the Barrington was too cautious to make. Right now you have to choose between the Pleiades URI or nothing at all -- but the Landmark Herodotus makes some modern-place associations that could be useful for this online text.

atomrab commented 10 years ago

When I closed out #18, it also occurred to me that even unlocated places in the Barrington Atlas were associated with map grids, on the most general level. Exampaios, for example, is "unlocated" and has no coordinates in the JSON representation, but it's still from Barrington Atlas 23, which ought to have coordinates somewhere. I may try to get Pleiades to expose these.

atomrab commented 10 years ago

I checked with Tom Elliott to see what's up here, and he said that Pleiades ought to be exposing at least the coordinates of the Barrington Atlas map page. The JSON for Exampaios clearly is not, so he thinks this is a problem on the Pleiades side -- but he is unlikely to have time to fix it in the immediate future. So we may be stuck with this for a while.

atomrab commented 10 years ago

Update: I'm looking at the front-page histogram in the instances of HestiaVis up on both Eric and Enrico's servers, and it seems to me that the list of places with zero references (at the bottom of the place list) has grown substantially since I last checked it. A large number of these places have both coordinates and specific references in the text, with properly-annotated tokens (e.g. Cephallania: http://enridaga.github.io/gapvis/gap2/#book/1/place/199 and http://enridaga.github.io/gapvis/gap2/#book/1/read/1440/199. What's happening here? My hand-edits to places have clearly been taken into account (I checked some of them, like Thermodon), but are no longer reflected in the histogram.

atomrab commented 10 years ago

And another update: puzzlingly, the summary screen for Firefox (for both Enrico's and Eric's versions) looks different from the one I see in Chrome. Here's Firefox showing the no-reference places at the bottom of the list: image And here's Chrome showing the same bottom of the list: image Note that a) there are a lot fewer no-reference places in the Chrome version and b) there are numerical reference counts along the right-hand margin. In both cases, I'm pointing to http://gap2.alexandriaarchive.org/gapvis/index.html#book/1. Any thoughts here?

enridaga commented 10 years ago

About the problems with references, may I have one (or more) example of place ID that:

atomrab commented 10 years ago

I put this in an email, but just so that we have a record here as well: Exampaeus has 0 references in the histogram, has a highlighted reference in the reading view (http://gap2.alexandriaarchive.org/gapvis/index.html#book/1/read/610/713) and has a place page where it also has a zero-reference value but is linked to the proper Pleiades page (http://gap2.alexandriaarchive.org/gapvis/index.html#book/1/place/713).

Again, I think this is because the histogram display is connected somehow to tokens that have non-zero coordinates.

enridaga commented 10 years ago

The data I have are as follow:

  1. no page includes place id 713 in the list of references, then it has 0 references, correctly
  2. for example, the references of page 610 do not contain place id 713: {"id":"610","places":["25","608","25","25","25","29","25"]}
  3. place 713 is in the list of places and so in the list of the histogram, correctly, with 0 references.
  4. The text of page 610 includes the following fragment: <span data-token-id="113092" class="place hi" data-place-id="713">Exampaeus, and then it is displayed with the link to the place id, correctly.

This looks independent with the coordinates and has only to do with the references being inconsistent or wrong in the data.

On 14 March 2014 11:13, atomrab notifications@github.com wrote:

I put this in an email, but just so that we have a record here as well: Exampaeus has 0 references in the histogram, has a highlighted reference in the reading view ( http://gap2.alexandriaarchive.org/gapvis/index.html#book/1/read/610/713) and has a place page where it also has a zero-reference value but is linked to the proper Pleiades page ( http://gap2.alexandriaarchive.org/gapvis/index.html#book/1/place/713).

Again, I think this is because the histogram display is connected somehow to tokens that have non-zero coordinates.

— Reply to this email directly or view it on GitHubhttps://github.com/ekansa/gap2/issues/26#issuecomment-37637161 .


enridaga

atomrab commented 10 years ago

Now I'm confused. Place ID 713 does appear on page 610 in the user interface: it's highlighted, appears as a pop-up window, and has a token associated with a Pleiades URI. So why would it not appear in the code? What is wrong in the data, if the token is correctly identified as a place and associated with a legitimate URI? And why is it that all of the zero-reference places also happen to have coordinates [0,0]? As far as I can tell, only the zero-reference places have this coordinate pair (all of them are Barrington Atlas "Unlocated" entries).

Also -- and is this perhaps connected? -- 0,0 places do not appear in the timeline. Is it the case that the timeline can only show places with coordinates, and inclusion in the timeline then dictates the appearance of references in the page code? Therefore, if a place can't be represented in the timeline, it doesn't count as a reference?

When you say the references are inconsistent or incorrect in the data, do you mean that the code is incorrect (that is, that there are isolated fragments)? The URI for this token is correct.

enridaga commented 10 years ago

I don't know. It depends on how the file http://gap2.alexandriaarchive.org/books/1.json is produced. If you search in that json for a place id, eg the string "713" (use quotes), you will find two occurrences, one is a page id and the other is the place coordinates (0,0). You will also see that all listed page ids come with a set of references as list of place ids. 713 happens nowhere. The UI code use this data, nothing else, to produce the statistical data that is represented in the histogram.

On 14 March 2014 12:03, atomrab notifications@github.com wrote:

Now I'm confused. Place ID 713 does appear on page 610 in the user interface: it's highlighted, appears as a pop-up window, and has a token associated with a Pleiades URI. So why would it not appear in the code? What is wrong in the data, if the token is correctly identified as a place and associated with a legitimate URI? And why is it that all of the zero-reference places also happen to have coordinates [0,0]? As far as I can tell, only the zero-reference places have this coordinate pair (all of them are Barrington Atlas "Unlocated" entries).

— Reply to this email directly or view it on GitHubhttps://github.com/ekansa/gap2/issues/26#issuecomment-37640280 .


enridaga