Open adarshp opened 2 years ago
A couple of additional ideas for compactifying:
arguments
and attachments
fields should not be published if they are empty.start_offset
and end_offset
fields should be combined into one field, like so:
"offsets": [0, 31]
where the first number is the start offset and the second number is the end offset.
Good ideas, I can put those in the Dialog Agent easily enough.
On Tue, Apr 5, 2022 at 1:55 PM Adarsh Pyarelal @.***> wrote:
A couple of additional ideas for compactifying:
- arguments and attachments fields should not be published if they are empty.
- the separate start_offset and end_offset fields should be combined into one field, like so:
"offsets": [0, 31]
where the first number is the start offset and the second number is the end offset.
— Reply to this email directly, view it on GitHub https://github.com/clulab/tomcat-text/issues/273#issuecomment-1089335410, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACD6H47G7KHASKQSB3TWBHLVDSSCHANCNFSM5ST4DM6Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks @jastier - let's not worry about it until after the study 3 code freeze on April 20. I would also like to run the proposed changes by the testbed WG before we implement them.
👍
On Tue, Apr 5, 2022 at 2:44 PM Adarsh Pyarelal @.***> wrote:
Thanks @jastier https://github.com/jastier - let's not worry about it until after the study 3 code freeze on April 20. I would also like to run the proposed changes by the testbed WG before we implement them.
— Reply to this email directly, view it on GitHub https://github.com/clulab/tomcat-text/issues/273#issuecomment-1089396428, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACD6H4ZPR364MXCQWFBTUHTVDSX3FANCNFSM5ST4DM6Q . You are receiving this because you were mentioned.Message ID: @.***>
I believe that the excessive verbosity of the JSON outputs due to the redundant serialization of the arguments of complex events is hurting the usability of our system. We should compactify our output by removing this redundancy.
Simple/low-level extractions act as the arguments for more complex extractions. For example, in the screenshot below,
CriticalVictim
andDeictic
act as theexists
andlocation
arguments forKnowledgeSharing
events.The corresponding
data.extractions
field is quite verbose:You can see in the above example that the
CriticalVictim
mention is serialized twice - once by itself, and once within theKnowledgeSharing
event.Can we instead have
data.extractions
be an object instead of an array? The object would have integer keys, and the values would be extractions. The integer keys can then be used in the serialization of the complex events and serve as pointers to the simple events. The current method of serialization obfuscates the fact that theCriticalVictim
mention in the argument of theKnowledgeSharing
event above is the same as the standaloneCriticalVictim
mention.It is likely too late to do this for Study 3, but we should seriously consider this for Study 4.