GateNLP / gate-core

The GATE Embedded core API and GATE Developer application
GNU Lesser General Public License v3.0
75 stars 29 forks source link

Renaming a corpus in a datastore doesn't update the datastore #131

Closed greenwoodma closed 3 years ago

greenwoodma commented 3 years ago

If you open a corpus that's in a datastore and rename it, the node in the datastore viewer never changes

image

We probably need to register a listener somewhere to catch this.

ianroberts commented 3 years ago

This is a deliberate simplification in SerialDataStore.getLrName, which deduces the name from the persistence ID (i.e. the name of the file in the datastore directory). The LR ID is derived from the name that the LR had at the time it was first adopted into the datastore, and can't be changed later without breaking the link between a persistent corpus and the documents it contains, but if we want to show the current name rather than the original one then we'd have to actually load the LR into memory first, and doing that for every document/termbank/ontology/whatever custom LR type in a large datastore would be too horrible to contemplate.

We could maybe change the serial datastore logic so that it does a writeUTF of the LR name at the top of the file, before the writeObject that saves the LR itself. That way, it'd still have to open every file to build the tree with the right names, but it would only have to read the name from the initial string, it wouldn't have to actually instantiate the whole LR.

ianroberts commented 3 years ago

Or maintain some sort of index file mapping IDs to names, but that comes with all its own problems around keeping the index up to date in the presence of potentially multi-threaded updates.

greenwoodma commented 3 years ago

hmm, all I can see here is a big "can-of-worms". I might just leave well enough alone given I doubt it happens often enough to be a big issue