Open keighrim opened 5 months ago
@keighrim - I had some floating questions about RFB and general MMIF structure
When calling mmif[<annotation_id>]
on a mmif object, I noticed that it only works if it's the annotation's .long_id
. Using regular .id
gives a KeyError. Is this intentional?
Maybe I'm answering my own question, but I also noticed that the .id
number is not unique globally, but only unique within a view. As in, v_1
and v_2
can both have a TextDocument td_1
in them. Is there an implicit assumption that this td_1
should correspond to the same document across views? If so, are there any innate enforcements/guards for that assumption, or are clams apps developers supposed to write logic complying with that assumption?
The RFB is implemented to return an empty csv if no roles/fillers are identified in the input (due to noise), or if the parser fails. @haydenmccormick brought up the suggestion that we add a runtime parameter to control whether or not the app should generate an annotation if the CSV content is empty. I thought this sounded reasonable, but had a concern related to the above 2 Q's.
If for example, docTR's td_1
was too noisy, and the user opts to have RFB omit empty CSVs, then it's possible that RFB's td_1
could correspond to docTR's td_2
(or a higher number) , which is not super intuitive. Ultimately, the number mismatch won't prevent us from tracing the relation because we have alignments
, but it could be less "user-friendly" to not have a global 1-to-1 mapping between id and document.
Regarding q1, I have started a new issue to make id
unambiguous. https://github.com/clamsproject/mmif/issues/228 The problem is that when we start to force long_id
everywhere, that'll break any future apps from past MMIFs (or past apps that generates past MMIFs).
You are right about the annotation id
without view-id prefix are implicitly "scoped" to the view it resides. That said, the annotation id can be re-used to refer to different objects as long as the "scope" is different. Thus, having v1:td2
is aligned to v2:td1
is totally fine and we don't care.
Al that put together, I don't think it's a good idea to produce "empty" text document when the RFB parsing fails - it doesn't add any information while adding space and time complexity to handle the MMIF outputs (storage-wise and json.load
-wise).
This thread to discuss output representation of R-F bindings in MMIF syntax and vocab.