digling / edictor

JavaScript program for interactive viewing, manipulating, and editing of wordlists, represented in form of TSV files.
MIT License
6 stars 2 forks source link

Edit Correspondence patterns not functional #223

Open FredericBlum opened 1 week ago

FredericBlum commented 1 week ago

@LinguList I fail to access the correspondence patterns in the current edictor setup. If I select 'Edit Correspondence Patterns', I get a non-reactive window. When I click 'submit', nothing appears in my comand line running the server. When I open it, the following line appears:

127.0.0.1 - - [02/Oct/2024 14:53:52] "GET /panels/patterns.html HTTP/1.1" 200 -

Screenshot 2024-10-02 at 14 54 58

If I select 'Compute > Correspondences', suddenly the window with the correspondence patterns open! However, I am then not able to interact with the patterns (e.g. select different ones) because the computation is stuck loading, indicated by the spinning whirl.

Screenshot 2024-10-02 at 14 56 03

In short, what am i doing wrong to access the corresponce patterns in my data?> Again, tested with migliazzayanomamic.

LinguList commented 1 week ago

Well, there are a few things to be kept in mind here:

  1. manual editing of correspondence patterns requires a column PATTERNS is that in your data?
  2. be careful in mixing automatic (overwrites) and manual (does only modify parts) pattern editing
LinguList commented 1 week ago

So you miss the column "PATTERNS".

LinguList commented 1 week ago

Add them by modivying this line int he config:

          "columns": "DOCULECT|CONCEPT|VALUE|FORM|TOKENS|COGID|COGIDS|ROOTIDS|ALIGNMENT|MORPHEMES|PATTERNS|NOTE"
LinguList commented 1 week ago

Then, the important question: do you want to compute patterns on partial or full cognates?

LinguList commented 1 week ago

Computing patterns on the remote server is a bit of a thing, because you have many data points that will be modified, it may take some time, I rather do not do so now, I just use your modified config, and download the data, and open from there now.

LinguList commented 1 week ago

Luckily, we can save files now locally (it saves in the folder where you opened edictor). You restart the server, you also see the file under FILES (good for testing!)

LinguList commented 1 week ago

I open the file now, then I go to CONFIGURATION -> SETTINGS and then I set the colexification mode to FULL COGNATES (it is already there).

LinguList commented 1 week ago

Now all patterns HAVE been computed, as you can see in the column PATTERNS.

LinguList commented 1 week ago

This works all flawless, I can edit the patterns now.

LinguList commented 1 week ago

I just pushed the example file to the edictor-remote folder in the main branch of migliazzayanomamic.

LinguList commented 1 week ago

edictor.org has some help on this, but it is not extensive. Our tests, however, which run these cases, have the workflow covered. You must first add PATTERNS by going to add column, inserting PATTERNS, then edit the SETTINGS, and then compute the first time the patterns, since if you don't compute, you get no patterns to be manipulated.

FredericBlum commented 1 week ago

Thanks for the walkthrough. One issue remains, however. I can now access the column 'PATTERNS', but when I reload the dataset and select 'Edit Correspondence Patterns', the window remains empty and unresponsive. I have to go throught he computation again to access the pattern themselves.

Screenshot from 2024-10-04 12-21-22

LinguList commented 1 week ago

Did you also test with the file itself?

LinguList commented 1 week ago

DeepinBildschirmfoto_Bereich auswählen_20241004223814

LinguList commented 1 week ago

Do you see those patterns in the data, @FredericBlum ? The are wrong. You must have at least a 0 for each sound in the alignment. Dunno, how they were introduced, you have empty patterns as well.

LinguList commented 1 week ago

The problem is, however deeper.

LinguList commented 1 week ago

The file that I created works for actually analyzing patterns.

remotemigliazzayanomamic.tsv.txt

LinguList commented 1 week ago

But now check the sqlite version. (filter by tokens = +)

DeepinBildschirmfoto_Bereich auswählen_20241004225124

against the text file

DeepinBildschirmfoto_Bereich auswählen_20241004225136

LinguList commented 1 week ago

You have many + chars for the cognate sets that should not be there. I think it is that the server was not properly uploaded, or similar Since you edit a bunch of data at once, and this may not make it all to the server (my wild guess now).

LinguList commented 1 week ago

Note that the patterns show the index of the ALIGNMENT but the ALIGNMENT in its TRIMMED form. So if you re-trim the data, the Patterns must be checked.

LinguList commented 1 week ago

This can be done manually, but it is tedious.

LinguList commented 1 week ago

I just used the debugger to check if I can repair the data. But it is not possible. The ways to proceed:

  1. make a fix in edictor that checks and ignores all sequences that have the problem (if a cognate set has conflicting pattern data), (partly handled, but obviously not good enough)
  2. for the case at hand, correct the data based on the derived file, and use that instead of the online version (you could recreate the sqlite with lexibase)
FredericBlum commented 6 days ago

Ok, but the pattern data was never manipulated manually. Doesn't this mean that there is some problem with the online-computation of patterns?

FredericBlum commented 6 days ago

There seem to be a number of problems distinct, but related problems:

So right now, it seems like the only solution is to re-do the correspondence patterns locally, and reupload, as you suggested. And then keep doing manual modifications of the Pattern ID column.

FredericBlum commented 6 days ago

It also seems like deleting data does not really work. I have removed two empty rows again and again, but they keep reappearing.

FredericBlum commented 6 days ago

One example of the problem with the comptuation online:

'p i h i + p i h k ɨ' - as alignment, nothing trimmed - becomes '608 155 145 153 + 0 0 0 + 0 0' - everything after '+' just gets turned to 0.

FredericBlum commented 6 days ago

In the online version, I have removed all '+' from the forms and alignments, except for the singletons. The patterns do not have anything in the form of '+ 0 0 0 0' anymore. But the Edit Correspondence Patterns button is still unresponsive when submitting if I do not re-compute the patterns before. Maybe we could also hgave a look at this in the Oberseminar? I feel like I still didn't understand some parts that I should have.

LinguList commented 5 days ago

The problem that you have with your data is -- but this must be fixed from the code side -- is that you do what we do NOT expect: you morpheme-segment your data and you trim later, to work with COGID instead of COGIDS. This confuses the method and leads to the plusses that you see there.

LinguList commented 5 days ago

EDICTOR as a default assigns each segment the pattern 0, unless it is a plus. We assume that plus is only used when working with COGIDS, but that's not the case for your dataset.

LinguList commented 5 days ago

The way to go here seems to be to get back to a version on a text file and to experiment, instead of the server version. This is easy to handle, and we can verify that deleting all plusses will get rid of these problems. But the general pattern problem persists, and we must find ways to handle it in a workflow.

LinguList commented 5 days ago

Can we for now verify what happens if we delete the pluses in a text file? If it is -- as I assume -- for the morpheme segments with full cognates, we are a step further.

LinguList commented 5 days ago

Ah, I just saw, sorry, I misread what you wrote. So we can verify that when working with a file and not with the server, the editing WORKS. This seems important to me.

LinguList commented 5 days ago

It means, if we convert the file to SQLITE and put that also on the server, I'd expect that updating would work.

LinguList commented 5 days ago

If that's the case, it is most likely that such long updates are not possible (or I may have even MISSED code updates on the server, so they never GET there).

LinguList commented 5 days ago

Assuming you work alone with the data now and want to explore just the correspondences, my recommendation is to switch to file edit-mode. You put the file next to the config, open edictor, and you find it in the tab FILES. Clicking on the right-most save button will store the file and leave a back-up of the previous version. So you have full account of what you edited.

LinguList commented 5 days ago

When I find time, we then resume the SQLITE issue and we discuss also these details in the OS!

FredericBlum commented 3 days ago

I will get to this next week, I hope. I am surprised that this doesn't work, because it was a standard part of the workflow in the old EDICTOR. E.g. in girardpanotakanan and valzarpanotakana, we have full cognacy and morpheme segmentation, which did not disturb the method. But it's good to know that we should avoid this. Of course, the cleaner implementation would be partial cognacy anyway, but it's unclear if we can invest the time necessary for this.

LinguList commented 2 days ago

It was always standard to COMPUTE, never to ANNOTATE. And now, you compute and send to the server, this was also never done.

LinguList commented 2 days ago

And it works -- as far as I have been confirming with your data -- if you use the file, not the server.