Closed adarshp closed 6 years ago
This is necessary due to the fact that the syntactic parser expects parentheses to be represented this way. But I think this can (and should) be reverted in the canonical name.
Agreed, I can make that change, thanks for the issue!
On Wed, Mar 28, 2018 at 7:45 PM Mihai Surdeanu notifications@github.com wrote:
This is necessary due to the fact that the syntactic parser expects parentheses to be represented this way. But I think this can (and should) be reverted in the canonical name.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/clulab/eidos/issues/235#issuecomment-377103037, or mute the thread https://github.com/notifications/unsubscribe-auth/AFIniRWM1Xd-0K8Y6NBHmeyte5DtqlY9ks5tjEq3gaJpZM4S_j-A .
Thanks, volunteer.
Excellent, thanks Becky!
hey all -- so I think that perhaps the only place they are occuring in the JSON-LD is in the text field, not canonical name, which kinda makes sense looking at the code. When I grep our example file, I can't find any examples of either with canonicalName -- @adarshp if you saw one, can you please send me a MWE so I can replicate it and write it up as a test? thanks! As I understand it, @MihaiSurdeanu we're not concerned about reverting parens in the doc text output correct?
You're right, they don't occur in the canonicalName. However, we were extracting the "text" field from the "DirectedRelation" entries as part of the provenance (if MITRE/analysts wish to inspect the original sentence). The other puzzling thing I saw was that the word 'conflict' was not being grounded: see the output of Eidos run on 10_FAO_a-i5505e.txt (one of the 52 docs from MITRE):
"@type" : "Entity",
"@id" : "_:Entity_3447",
"labels" : [ "NounPhrase", "Entity" ],
"text" : "Conflict",
"rule" : "simple-np",
"canonicalName" : "Conflict",
"grounding" : [ {
"@type" : "Grounding",
"ontologyConcept" : "/entities/human/nation",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/natural/crop",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/events/human/human_migration",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/human/fertilizer",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/human/livelihood",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/natural/soil/soil_contents",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/temporal/months",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/human/financial/economic/revenue",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/entities/measurement/weight",
"value" : 0.0
}, {
"@type" : "Grounding",
"ontologyConcept" : "/events/human/famine",
"value" : 0.0
} ],
"provenance" : [ {
"@type" : "Provenance",
"document" : {
"@id" : "_:Document_3573"
},
"sentence" : {
"@id" : "_:Sentence_3835"
},
"positions" : {
"@type" : "Interval",
"start" : 4,
"end" : 4
}
} ]
I thought that this might be due to the parentheses escaping - but I could be wrong - could you check your output on 10_FAO_a-i5505e.txt (it's in the Google Drive folder I think)?
I appreciate people's detective work. Sometimes it's a thankless task.
Fwiw, we should not revert parens in the doc text. Just in canonical names, if they occur there.
ok -- then I am closing the Issue for now, based on the content of the thread. we can reopen if needed. thanks all!
Sounds good - I can unescape them downstream to make the text more human-readable.
When parentheses are present in the text, they get escaped as -LRB-, -RRB-, etc. This gets propagated to the sentence text in the JSON-LD output file. I suspect it might also cause some weird issues - such as the entity "Conflict" not being grounded in the sentence
"Conflict affects mostly the Greater Upper Nile Region (states of Upper Nile, Unity and Jonglei) with Central Equatoria remaining by and large unaffected after the early stages of the conflict."
Minimal working example with
sbt console
:I think the issue is related to Universal Dependencies - I managed to find the following issues filed in 2015:
https://github.com/UniversalDependencies/UD_English-EWT/issues/1 https://github.com/UniversalDependencies/docs/issues/148
Any ideas on how to fix this?