GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

Document view doesn't update if you programatically change the content #132

Closed greenwoodma closed 3 years ago

greenwoodma commented 3 years ago

If a PR updates the content of a document which is already open in the UI then while the actual content is changed the UI doesn't update. This can be very confusing indeed as re-running an app can given very odd behaviour indeed.

greenwoodma commented 3 years ago

So the issue here is that if you call DocumentImpl.setContent(DocumentContent) all that happens is that the internal content variable gets updated. None of the events fire and hence no UI changes. What I'm less sure about is what should actually happen. Should we treat this as essentially an edit of the document over the full length of the previous contents so annotations etc. can be updated? I can envisage situations where replacing the content would leave annotations attached who's offsets fall outside the length of the new content. This seems bad in so many ways, yet at the same time I know setContent is called during some of the markup unpacking when we load documents so I'm not sure if changing this to be an edit would then mess up loading documents.

I'm going to experiment further but does anyone have any thoughts? The Javadocs don't help as they don't really define what should happen. @ianroberts any thoughts as you are as likely as anyone to know what this should do, versus what it does do.

ianroberts commented 3 years ago

Changing setContent to call edit looks like it shouldn't break anything - the document formats always set the content before creating any annotations, so a call to document.edit(0, docLength, newcontent) basically boils down to a call to edit on the original DocumentContent, which replaces the old string with a new one. It might be sensible to put a special case optimisation in DocumentContentImpl.edit so that when you give it offsets that cover the whole of the current content then it can bypass the StringBuffer bit and simply copy the new content's String by reference.

What PRs do we have that call Document.setContent rather than using the edit method anyway?

greenwoodma commented 3 years ago

So technically it's not a PR but the app we use for the OCR service on cloud. When it spots the document is a Base64 encoded image it sends that to the service and then when it gets text back replaces the document content with the ocr text. I spotted this problem when testing that as it appeared to be doing nothing at all.

Having dug around a bit further I think that if I changed it to be a proper edit I might break the UIMA document format. It seems to set the content but then continue to use annotations from original markups, but these would have disappeared if the content was an edit (I think). I'll try and test that and see for sure, but I'm tempted just to do the variable swap as we do currently and then fire the content edited event to update the UI rather than making it an actual edit just to be on the safe side.

greenwoodma commented 3 years ago

I'm going to close this as "wontfix". This is partly as the code I had should really have been using edit anyway, and I don't want to risk changing the behaviour of such a long standing method without being able to fully test all the places it might have been used.