clamsproject / app-role-filler-binder

Apache License 2.0
0 stars 0 forks source link

output MMIF format #1

Open keighrim opened 5 months ago

keighrim commented 5 months ago

This thread to discuss output representation of R-F bindings in MMIF syntax and vocab.

wricketts commented 5 months ago

@keighrim - I had some floating questions about RFB and general MMIF structure

  1. When calling mmif[<annotation_id>] on a mmif object, I noticed that it only works if it's the annotation's .long_id. Using regular .id gives a KeyError. Is this intentional?

  2. Maybe I'm answering my own question, but I also noticed that the .id number is not unique globally, but only unique within a view. As in, v_1 and v_2 can both have a TextDocument td_1 in them. Is there an implicit assumption that this td_1 should correspond to the same document across views? If so, are there any innate enforcements/guards for that assumption, or are clams apps developers supposed to write logic complying with that assumption?

  3. The RFB is implemented to return an empty csv if no roles/fillers are identified in the input (due to noise), or if the parser fails. @haydenmccormick brought up the suggestion that we add a runtime parameter to control whether or not the app should generate an annotation if the CSV content is empty. I thought this sounded reasonable, but had a concern related to the above 2 Q's.

    If for example, docTR's td_1 was too noisy, and the user opts to have RFB omit empty CSVs, then it's possible that RFB's td_1 could correspond to docTR's td_2 (or a higher number) , which is not super intuitive. Ultimately, the number mismatch won't prevent us from tracing the relation because we have alignments, but it could be less "user-friendly" to not have a global 1-to-1 mapping between id and document.

keighrim commented 5 months ago

Regarding q1, I have started a new issue to make id unambiguous. https://github.com/clamsproject/mmif/issues/228 The problem is that when we start to force long_id everywhere, that'll break any future apps from past MMIFs (or past apps that generates past MMIFs).

You are right about the annotation id without view-id prefix are implicitly "scoped" to the view it resides. That said, the annotation id can be re-used to refer to different objects as long as the "scope" is different. Thus, having v1:td2 is aligned to v2:td1 is totally fine and we don't care.

Al that put together, I don't think it's a good idea to produce "empty" text document when the RFB parsing fails - it doesn't add any information while adding space and time complexity to handle the MMIF outputs (storage-wise and json.load-wise).