Open FredericBlum opened 1 week ago
Well, there are a few things to be kept in mind here:
So you miss the column "PATTERNS".
Add them by modivying this line int he config:
"columns": "DOCULECT|CONCEPT|VALUE|FORM|TOKENS|COGID|COGIDS|ROOTIDS|ALIGNMENT|MORPHEMES|PATTERNS|NOTE"
Then, the important question: do you want to compute patterns on partial or full cognates?
Computing patterns on the remote server is a bit of a thing, because you have many data points that will be modified, it may take some time, I rather do not do so now, I just use your modified config, and download the data, and open from there now.
Luckily, we can save files now locally (it saves in the folder where you opened edictor). You restart the server, you also see the file under FILES (good for testing!)
I open the file now, then I go to CONFIGURATION -> SETTINGS and then I set the colexification mode to FULL COGNATES (it is already there).
Now all patterns HAVE been computed, as you can see in the column PATTERNS.
This works all flawless, I can edit the patterns now.
I just pushed the example file to the edictor-remote folder in the main branch of migliazzayanomamic.
edictor.org has some help on this, but it is not extensive. Our tests, however, which run these cases, have the workflow covered. You must first add PATTERNS by going to add column, inserting PATTERNS, then edit the SETTINGS, and then compute the first time the patterns, since if you don't compute, you get no patterns to be manipulated.
Thanks for the walkthrough. One issue remains, however. I can now access the column 'PATTERNS', but when I reload the dataset and select 'Edit Correspondence Patterns', the window remains empty and unresponsive. I have to go throught he computation again to access the pattern themselves.
Did you also test with the file itself?
Do you see those patterns in the data, @FredericBlum ? The are wrong. You must have at least a 0 for each sound in the alignment. Dunno, how they were introduced, you have empty patterns as well.
The problem is, however deeper.
The file that I created works for actually analyzing patterns.
But now check the sqlite version. (filter by tokens = +)
against the text file
You have many + chars for the cognate sets that should not be there. I think it is that the server was not properly uploaded, or similar Since you edit a bunch of data at once, and this may not make it all to the server (my wild guess now).
Note that the patterns show the index of the ALIGNMENT but the ALIGNMENT in its TRIMMED form. So if you re-trim the data, the Patterns must be checked.
This can be done manually, but it is tedious.
I just used the debugger to check if I can repair the data. But it is not possible. The ways to proceed:
Ok, but the pattern data was never manipulated manually. Doesn't this mean that there is some problem with the online-computation of patterns?
There seem to be a number of problems distinct, but related problems:
So right now, it seems like the only solution is to re-do the correspondence patterns locally, and reupload, as you suggested. And then keep doing manual modifications of the Pattern ID column.
It also seems like deleting data does not really work. I have removed two empty rows again and again, but they keep reappearing.
One example of the problem with the comptuation online:
'p i h i + p i h k ɨ' - as alignment, nothing trimmed - becomes '608 155 145 153 + 0 0 0 + 0 0' - everything after '+' just gets turned to 0.
In the online version, I have removed all '+' from the forms and alignments, except for the singletons. The patterns do not have anything in the form of '+ 0 0 0 0' anymore. But the Edit Correspondence Patterns button is still unresponsive when submitting if I do not re-compute the patterns before. Maybe we could also hgave a look at this in the Oberseminar? I feel like I still didn't understand some parts that I should have.
The problem that you have with your data is -- but this must be fixed from the code side -- is that you do what we do NOT expect: you morpheme-segment your data and you trim later, to work with COGID instead of COGIDS. This confuses the method and leads to the plusses that you see there.
EDICTOR as a default assigns each segment the pattern 0, unless it is a plus. We assume that plus is only used when working with COGIDS, but that's not the case for your dataset.
The way to go here seems to be to get back to a version on a text file and to experiment, instead of the server version. This is easy to handle, and we can verify that deleting all plusses will get rid of these problems. But the general pattern problem persists, and we must find ways to handle it in a workflow.
Can we for now verify what happens if we delete the pluses in a text file? If it is -- as I assume -- for the morpheme segments with full cognates, we are a step further.
Ah, I just saw, sorry, I misread what you wrote. So we can verify that when working with a file and not with the server, the editing WORKS. This seems important to me.
It means, if we convert the file to SQLITE and put that also on the server, I'd expect that updating would work.
If that's the case, it is most likely that such long updates are not possible (or I may have even MISSED code updates on the server, so they never GET there).
Assuming you work alone with the data now and want to explore just the correspondences, my recommendation is to switch to file edit-mode. You put the file next to the config, open edictor, and you find it in the tab FILES. Clicking on the right-most save button will store the file and leave a back-up of the previous version. So you have full account of what you edited.
When I find time, we then resume the SQLITE issue and we discuss also these details in the OS!
I will get to this next week, I hope. I am surprised that this doesn't work, because it was a standard part of the workflow in the old EDICTOR. E.g. in girardpanotakanan and valzarpanotakana, we have full cognacy and morpheme segmentation, which did not disturb the method. But it's good to know that we should avoid this. Of course, the cleaner implementation would be partial cognacy anyway, but it's unclear if we can invest the time necessary for this.
It was always standard to COMPUTE, never to ANNOTATE. And now, you compute and send to the server, this was also never done.
And it works -- as far as I have been confirming with your data -- if you use the file, not the server.
@LinguList I fail to access the correspondence patterns in the current edictor setup. If I select 'Edit Correspondence Patterns', I get a non-reactive window. When I click 'submit', nothing appears in my comand line running the server. When I open it, the following line appears:
127.0.0.1 - - [02/Oct/2024 14:53:52] "GET /panels/patterns.html HTTP/1.1" 200 -
If I select 'Compute > Correspondences', suddenly the window with the correspondence patterns open! However, I am then not able to interact with the patterns (e.g. select different ones) because the computation is stuck loading, indicated by the spinning whirl.
In short, what am i doing wrong to access the corresponce patterns in my data?> Again, tested with migliazzayanomamic.