Open dselman opened 4 years ago
Should this be flagged as a 1.0
feature?
My queries are as follows:
Is fuzzy matching the best technique so far? What are your views. The Manhattan LSTM architecture has been used too for similar tasks. Semantic Textual Similarity can be our priority since it focuses on cross lingual aspect.
"shipper": "Party A",
"receiver": "Party B",
"deliverable": "Widgets",
"businessDays": 10,
"attachment": "Attachment X"
if the model has classified the input paragraphs with a high probability as "acceptance of delivery" and has identified the variables with a high probability then we can use Cicero draft
to convert the JSON values for the variables back into legal text, inserting a CiceroMark clause node into the DOM - replacing the input paragraph nodes.
npm install -g @accordproject/markdown-cli
markus parse --sample ~/Desktop/aod.md
Output (CommonMark):
{
"$class": "org.accordproject.commonmark.Document",
"xmlns": "http://commonmark.org/xml/1.0",
"nodes": [
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is some text."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is more."
}
]
},
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance of Delivery."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party A" will be deemed to have completed its delivery obligations"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "if in "Party B"'s opinion, the "Widgets" satisfies the"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance Criteria, and "Party B" notifies "Party A" in writing"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "that it is accepting the "Widgets"."
}
]
},
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Inspection and Notice."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party B" will have 10 Business Days to inspect and"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "evaluate the "Widgets" on the delivery date before notifying"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party A" that it is either accepting or rejecting the"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Widgets"."
}
]
},
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance Criteria."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "The "Acceptance Criteria" are the specifications the "Widgets""
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "must meet for the "Party A" to comply with its requirements and"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "obligations under this agreement, detailed in "Attachment X", attached"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "to this agreement."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is even more."
}
]
}
]
}
The CiceroMark representation (output). Note the org.accordproject.ciceromark.Clause
node.
test.md.txt
markus parse --sample ~/Downloads/ciceromark.md --cicero
{
"$class": "org.accordproject.commonmark.Document",
"xmlns": "http://commonmark.org/xml/1.0",
"nodes": [
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is some text."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is more."
}
]
},
{
"$class": "org.accordproject.ciceromark.Clause",
"clauseid": "6b32cc75-40ed-445f-a3de-1bc37392d232",
"src": "ap://acceptance-of-delivery@0.13.1#b5505785f4de9000be15687601d869cb1719df2f482cd57f3bf4fbd6774127bc",
"nodes": [
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance of Delivery."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party A" will be deemed to have completed its delivery obligations"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "if in "Party B"'s opinion, the "Widgets" satisfies the"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance Criteria, and "Party B" notifies "Party A" in writing"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "that it is accepting the "Widgets"."
}
]
},
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Inspection and Notice."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party B" will have 10 Business Days to inspect and"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "evaluate the "Widgets" on the delivery date before notifying"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Party A" that it is either accepting or rejecting the"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": ""Widgets"."
}
]
},
{
"$class": "org.accordproject.commonmark.Heading",
"level": "2",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "Acceptance Criteria."
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "The "Acceptance Criteria" are the specifications the "Widgets""
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "must meet for the "Party A" to comply with its requirements and"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "obligations under this agreement, detailed in "Attachment X", attached"
},
{
"$class": "org.accordproject.commonmark.Softbreak"
},
{
"$class": "org.accordproject.commonmark.Text",
"text": "to this agreement."
}
]
}
]
},
{
"$class": "org.accordproject.commonmark.Paragraph",
"nodes": [
{
"$class": "org.accordproject.commonmark.Text",
"text": "This is even more."
}
]
}
]
}
Is fuzzy matching the best technique so far? What are your views. The Manhattan LSTM architecture has been used too for similar tasks. Semantic Textual Similarity can be our priority since it focuses on cross lingual aspect.
One of the goals should be to explore some different options (literature review?) and measure accuracy.
Thanks for the comments. I'm actually looking for different options and benchmark to see how they perform. Literature review as you said. I will need some time to do that. Meanwhile feel free to suggest any papers you come across that you think could be of any use. This is a very interesting problem. It may help us in creating some novel algorithm. Perhaps a paper. As I have mentioned previously, I'm also working on my paper which is somewhat related to this. This is going to be fun.
Question: Is there any restriction on the tech stack I can use?
@sunn-e I doubt there would be. I think raise it with the community, maybe on slack, if it's not something we currently use. Still, doubt it would be an issue...
Question: Is there any restriction on the tech stack I can use?
It has to be open source ( I think ! ).
Question: Is there any restriction on the tech stack I can use?
It has to be open source ( I think ! ).
I would add: multi platform. I don't think we should go for something that is e.g., Windows only.
Will the organisation going to provide computing resources for training models related to document classification and NER?
Will the organisation going to provide computing resources for training models related to document classification and NER?
Good question! @dselman @adriaan-pelzer any thought?
Yes, within reason, we can provision VMs on AWS to help with training.
Users currently have large volumes of existing legal contracts, containing a wide variety of legal clauses. E.g. the contracts may contain 90 day payment terms, acceptance of delivery clauses etc.
It would be very useful to be able to train a model to detect instances of a specific type of clause (say acceptance of delivery) and then to automatically (or with manual guidance) help the user replace the natural language in their contract with an instance of the acceptance of delivery clause, with the correct values extracted from the input contract.
The template detection logic should operate upon the CommonMark DOM (a JSON representation of Markdown formatted text) - replacing paragraph nodes in the DOM with Clause nodes from the CiceroMark DOM.
It would be useful to be able to run a fuzzy match a CommonMark DOM against a set of templates - detecting potential instances of the templates in the CommonMark text (clauses).