Transkribus / TranskribusSwtGui

Note: the repo has been moved to https://gitlab.com/readcoop/Transkribus/TranskribusSwtGui
GNU General Public License v3.0
18 stars 4 forks source link

Storage.transcript field's transcript metadata and page data may get out of sync #310

Closed kahlep closed 3 years ago

kahlep commented 4 years ago

A user reported that the transcription of the first page of a document was erroneously overwritten with layout and text of the last page of the doc. The gui log shows no messages about that save but transcript parent IDs on the last page point to a transcript on the first page too, so data among those pages are mixed.

  1. a) JaxBPageTranscript::setMd updates the TranscriptMetadata (bringing in a bad parent ID and status) but keeps the same pageData (layout and text). pageData contains another instance of TranscriptMetadata, then out of sync.

    b) JaxBPageTranscript::setPageData updates the pageData and sets any existing, but outdated TranscriptMetadata on it.

  2. Storage::saveTranscript will use the pageId from transcript.getPage().getMd() for the save but the status and parentId from transcript.getMd().

  3. Storage::saveTranscript triggers a reload of the transcript list. Storage::reloadTranscriptsList uses the index of Storage.page within the currently loaded document which may not match pageNr in the two TrpTranscriptMetadata fields used in 2. Then transripts of another page are loaded for the current page after saving (see "Versions of current page" Dialog)

in case of 1a) the wrong parent ID and status will be stored with the new transcript. See docId 396343, pageId 17801383 (page 123) in case of 1b) the paged data is saved to some other page, not shown in the view. See docId 396343,pageId 15398342 (page 1)

Still unclear how to reproduce this issue and therefore how often it occurs. Could autosave do that silently!?

See data in prod DB:

select * from pages p
join transcripts t on t.pageid = p.pageid
where p.pageid in (17801383, 15398342)
order by p.pagenr asc, t.tsid desc
kahlep commented 4 years ago

Log of the user's GUI points to usage of document manager.

DocumentManager::totalReload will cause issue if any other page than the first one is loaded. The document is reloaded, index is set to first page, but the page is not reloaded, which results in "mixed state" as described above.

Load a document, switch to any page other than the first one, open document manager and e.g. try:

Then save a few times on the currently loaded page without touching anything else Transcript parent IDs then always point to a transcript on the first page. If pages have been moved or added, pageData will be saved to the wrong page (as page index within document does not align)

After that, any unsaved changes will be written to disk by autosave using the pageId of the first page and is loaded erroneously on startup.

jkloe commented 4 years ago

Testdoc on testsrv: 1253