goodmami / penman

PENMAN notation (e.g. AMR) in Python
https://penman.readthedocs.io/
MIT License
141 stars 26 forks source link

Add AMR concept inventory #58

Open goodmami opened 4 years ago

goodmami commented 4 years ago

This depends on #57, although #57 will be informed by this issue.

The AMR concept inventory is, as I understand, a fork of Propbank frames. It is packaged with the LDC release which I don't have access to and also raises questions of licensing if I were to include (a derived form of) those in this repository.

There's a plain-text version, that I think is equivalent, at https://amr.isi.edu/doc/propbank-amr-frames-arg-descr.txt

One thing I could do here is make a reader or converter for the frame files so someone with access to the LDC release could create the appropriate files for Penman to use. I just need to confirm the format of those files, then.

goodmami commented 1 year ago

@flipz357 I see that you have included the above-linked file in Smatch++ here:

https://github.com/flipz357/smatchpp/blob/main/smatchpp/resource/propbank-amr-frames-arg-descr.txt

The main reason I haven't included it here is because I don't see any licensing information along the file. Do you know what are the terms for redistribution, or who I would ask about that?

flipz357 commented 1 year ago

Yeah... Smatch++ uses it for some ---optional, experimental--- "semantic" AMR standardization and fine-grained scoring. The idea is that, e.g., to cut out "causal" sub-graphs from AMR, we do not only want to retrieve :cause/cause-01 graph parts, but also any arg_x that may be labeled as a "cause" in propbank. Similar for other aspects like location, time, etc.

Maybe I wrongly presumed it is under free public license (what license are even the AMR guidelines, it doesn't say anything?). I also do not know who can know more about this. I'll think about it.

goodmami commented 1 year ago

@timjogorman, since your name is on both the AMR 3.0 LDC release and on the Propbank organization, I'm hoping you might be able to help us with the above question, paraphrased here for convenience:

I suspect it is just an export of https://github.com/propbank/propbank-frames, in which case it may fall under the latter's CC-BY-SA-4.0 license? Also, it's not in the resource lists of AMR's Download page, which gives the impression that it's an internal file not meant for redistribution.

flipz357 commented 1 year ago

Don't know if it's important, but while it's not explicitly listed in the AMR download page, it's explicitly linked from the AMR guidelines

goodmami commented 1 year ago

it's explicitly linked from the AMR guidelines

Ah, nice, that's reassuring. I thought I'd found it through a more obscure link in the AMR Dictionary or similar.

timjogorman commented 1 year ago

For licensing: I don't have an official answer for this but I'd assume it falls under the CC-BY-SA-4.0 license .

More generally: For the latest version of Propbank frame versions they have added a "usage" field in the frame files, marking which projects each frame was available for. So one should be able to pull the information that's in that plain-text file from the latest frames , filtering for their usage in the AMR release. I apologize that we never got out an API for Propbank that would make things like that easier; doing so should be much easier than it is.

goodmami commented 1 year ago

@timjogorman thanks for answering, even if unofficially :) It sounds like the safer thing to do in terms of licensing is point to the Propbank Frames source and write some code to extract the relevant info. Thanks for pointing out the "usage" field, which should help in determining what is available to AMR.

flipz357 commented 1 year ago

Within Smatch++ I implemented a first version of the solution of loading frame files on demand. If you want, you can use/adapt the two functions that I have now implemented:

checking availability, and maybe downloading and returning a python dict with frames

downloading