amrisi / amr-guidelines

240 stars 86 forks source link

Link text for url-entities #161

Closed timjogorman closed 8 years ago

timjogorman commented 8 years ago

Hi all! In looking at the IAA diffs, I noticed that we're all doing "link text" -- urls that are replaced with text -- differently, and wanted to get us on the same page (It would be good to get examples for each of these issues into the guidelines, so that we are more consistent about it.)

Question 1:

How do we treat links with article titles as the link text?

PolitiFact | The Obameter: Create 5 million "green" jobs

Lowe's pulls advertising from TLC's 'All-American Muslim' - CNN.com

Missing Guns Raise Eyebrows over U.S. Arms Dealings Abroad

We've done these as: flat named articles:

(a / article
     :name (n2 / name :op1 "Missing" :op2 "Guns" :op3 "Raise" :op4 "Eyebrows" :op5 "over" :op6 "U.S." :op7 "Arms" :op8 "Dealings" :op9 "Abroad"))

as normal AMRs with url-entity tag:

(r / raise-01
     :ARG0 (g / gun
          :ARG1-of (m / miss-01))
     :ARG1 (e / eyebrow)
     :ARG1-of (c / cause-01
          :ARG0 (d / deal-01
               :ARG2 (a / arm)
               :mod (c2 / country
                    :name (n / name :op1 "United" :op2 "States"))
               :location (a2 / abroad)))
     :mod (u / url-entity
          :value "http://news.yahoo.com/s/oneworld/20060519/wl_oneworld/45361331031148010119"))

as a normal AMR with no url-entity:

(r / raise-01
     :ARG0 (g / gun
          :ARG1-of (m / miss-01))
     :ARG1 (e / eyebrow)
     :topic (d / deal-01
          :ARG0 (c / country
               :wiki "United_States"
               :name (n / name :op1 "U.S."))
          :ARG2 (a / arm)
          :location (a2 / abroad)))
Proposed:

The second option -- as a normal AMR with a url-entity tag, and with ":mod" used to link to the url-entity -- seems like the best to me, since we seem to have traditionally just parsed headlines as normal text. Does that sound like a good treatment?

Question 2:

How do we deal with link text when it's a description of the address -- particularly when it contains information like the website or host publication?

I'm assuming that for link text like "here" or "in the link", we could replace it with the url-entity directly. The question is for issues like:

This case can be read at this Findlaw page

I haven't found anything yet, but I came across this interesting information in Wikipedia:

Proposed:

I'd imagine that we'd want to keep the NEs mentioned in "the Findlaw page", "from Wikipedia", "CNN" etc. I'd personally want this kind of mention to look like:

(p2 / page
                  :poss (c2 / company :name (n / name :op1 "Findlaw"))
                  :mod (t2 / this)
                  :location (u / url-entity :value "http://caselaw.findlaw.com/us-supreme-court/307/174.html")))
(i2 / information
      :location (u / url-entity :value "http://en.wikipedia.org/wiki/Catholic_sex_abuse_cases")
      :ARG2-of (i3 / interest-01))

Any opinions on those?

nschneid commented 8 years ago

I'd imagine that we'd want to keep the NEs mentioned in "the Findlaw page", "from Wikipedia", "CNN" etc. I'd personally want this kind of mention to look like:

(p2 / page
          :poss (c2 / company :name (n / name :op1 "Findlaw"))
          :mod (t2 / this)
          :location (u / url-entity :value "http://caselaw.findlaw.com/us-supreme-court/307/174.html")))

I think I'd prefer :source instead of :poss for the name of the site/content provider.

uhermjakob commented 8 years ago

Thanks, Tim, good point. When I checked the SemEval AMRs earlier this week, the need for expanding the guidelines in this respect became clear as well. And I think that your proposals generally reflect what most annotators have been doing.

At a more detailed level, regarding question 1, I have seen cases such as

(p / publication :wiki "CNN" :name (n / name :op1 "CNN")
      :ARG1-of (l / link-01
            :ARG2 (u / url-entity :value "http://www.cnn.com")))

so, basically, link-01 instead of :mod.

Regarding question 2, I wonder whether we might not want to drop deictic terms such as "this" and "here". Example: The report can be found <a href="...">here</a>.

(p / possible-01
      :ARG1 (f / find-01
            :ARG1 (r / report)
            :location (u / url-entity :value "https://www.amnestyusa.org/sites/default/files/air12-report-english.pdf")))

-- Ulf

uhermjakob commented 8 years ago

Decision at AMR phone meeting on Dec. 7, 2015: