PhonologicalCorpusTools / SLPAA

5 stars 0 forks source link

Merging corpora #275

Closed kchall closed 1 month ago

kchall commented 6 months ago

Allow corpora to be merged together.

  1. Open an existing corpus.
  2. Go to File / Merge corpora into this corpus.
  3. Select options:
image
  1. In a standard file selection window, select one or more corpora to merge in.
  2. [If #105 has been implemented, have some kind of reporting of conflicting EntryID-glosses and what to do about it -- needs to be manually corrected before the merge can proceed]
  3. Have new merged corpus saved and open? (not sure how to handle the saving and naming...)
kvesik commented 5 months ago

FYI, PR #286 adds a function Corpus.increaseminID() which will hopefully be useful when merging.

stannam commented 5 months ago

Thanks, Kaili!

kvesik commented 4 months ago

@stannam note that merge has been bare-bones implemented according to issue #299 (in a very basic form, not exactly how Kathleen specified above) so that we can get sign lists organized pre-LREC. It's not yet merged to main; I'll update here when that happens.

Update 20240408: #299 now merged to main.

kvesik commented 2 months ago

@kchall some clarifying questions:

  1. Confirming that "File / Merge corpora into this corpus" means that the user will always be merging/importing into the existing corpus (even if empty, I guess)-- no need to offer a way to merge a bunch of corpora independently of the current one (which is how I implemented it in the hacky version from issue #299 ) ?
  2. Any need to keep the feature from #299 that allows the user to choose the order in which to merge the corpora? Or can this be done randomly/arbitrarily?
  3. Confirming that the new merged corpus should be opened instead of the one that was open before the merge? Or should this be a user decision as part of the merge process?
kchall commented 2 months ago

@kvesik Thanks for checking!

  1. No, actually I really like the option to merge files separately from the corpus that one is currently in. I can see uses for both cases, but if we were only going to keep one, I'd prefer the separate merge like in the #299 version.
  2. I don't think the order really matters.
  3. It should probably be a user decision as part of the merge process, especially if we're allowing separate / independent merges as in (1).
kvesik commented 2 months ago

@kchall how would you like to deal with EntryID formatting conflicts? For example, your initial proposed mockup asks the user about potential entryid conflicts, and that would be related to the sequential counter that's used as the foundation for entry IDs. However, we also have this customizable system for what other information is also displayed as part of the entryID. image

When the merge is complete, should we just by default spawn that preferences window and ask the user to confirm their entryID preferences? or just reset it back to default (nothing except sequential counter)? there could also be a separate global setting to ask what the default behaviour should be re entryID formatting when merging corpora.

kchall commented 2 months ago

@kvesik Good question. Probably best to ask them to confirm. Would it be possible, if all of the corpora involved in the merge are the same, to pre-populate with those options? If that's tricky, though, it can just be the regular default options shown.

kvesik commented 2 months ago

Note to self after starting to tie myself into knots over this... I forgot that the EntryID format settings are global for the software overall, not for each corpus. So this is a moot point since the user will always see the EntryIDs displayed in the same way anyway. Phew!

  1. [If Additional glosses for each sign #105 has been implemented, have some kind of reporting of conflicting EntryID-glosses and what to do about it -- needs to be manually corrected before the merge can proceed]
kvesik commented 2 months ago

Note for documentation: the user can choose to merge the selected files into the currently-open corpus (overwriting it with the merged version) OR to merge them into an entirely new corpus. Either way, the file selection logic takes care of the currently-open corpus (makes sure it is selected if necessary and that it is not selected twice).

Update: There is no longer any option to merge into the currently-open corpus. All merges create new files.