Signbank / Global-signbank

An online sign dictionary and sign database management system for research purposes. Developed originally by Steve Cassidy/ This repo is a fork for the Dutch version, previously called 'NGT-Signbank'.
http://signbank.cls.ru.nl
BSD 3-Clause "New" or "Revised" License
19 stars 13 forks source link

how to change phonology/morphology on SignBank #1219

Open rosestamp opened 2 months ago

rosestamp commented 2 months ago

new_ISL_lemma_updates (1).csv

I am trying to upload this CSV in order to update existing entries on our ISL dataset with new morphological and phonological information. When I try, using "Import CSV Update Existing Glosses", it states: Attempt to update Lemma translations. Use Import CSV Lemma Update instead. When I do what it suggests using Lemma Update, it states: The header row of the csv file looks like this: Handedness, Strong hand, Weak hand, Strong hand letter, Contact type, Location, Movement direction, Movement Shape, Relation between Articulators, Handshape Change, Repeated movement, Alternating movement, 42719, ISL, DUCK, 2s, B, no, , neutral space, to and fro, straight, next to, Yes, yes, 42687, DEER, W, beak2_open_spread, initial, forehead, forwards, arc, No

I do not understand what is wrong with my file. Please let me know. Thanks!

susanodd commented 2 months ago

@rosestamp the easiest is to remove the columns you do not need to update from the CSV file. Update only needs the Signbank ID and Dataset columns and the specific columns you are updating.

Here is the file with the headers fixed and that lemma column removed.

new_ISL_lemma_updates.1e.csv

However, it still has capitalisation problems on some of the fields. (You will see this if you "import csv update existing gloss" on the updated file.)

May I ask, how did you enter the data? I haven't seen this before without caps. Did the spreadsheet program do this?

[I will see if I can revise the code to accept the choice field values also if the first letter is not a cap. It is easy to fix but a bit annoying.]

susanodd commented 2 months ago

IMPLEMENTATION CODE COMMENTS

I found it!! iexact

I'm wondering why we didn't use this from the start? (Although I don't know if there is actually e.g., a difference in b versus B in the field choices? A test will be needed on existing field choices to detect multiple objects found in case case is relevant in the choices.

BACKGROUND @Woseseltops @vanlummelhuizen any ideas here? This is the first time this comes up. This is about the case of the field choices. If they don't match the case in the database exactly, the retrieval does not match. It looks like others also have this problem, as somebody has implemented a new kind of Django field for this:

https://github.com/iamoracle/django_case_insensitive_field

The Tags model (prefab) is case sensitive as well. If you create new tags that differ in case, they are different tags. In this issue, the case of the field choice values does not match those in the database because they don't start with a cap or differ in spacing. This is only a problem with "user" input of the values. All of the Signbank templates use Model Form choices. This will potentially be a problem with the API as well. The API Gloss Update error code needs to report the mis-matched values in case the case differs or the case is relevant and multiple matches are found.

susanodd commented 2 months ago

Hmmmm. The iexact will need to be on a model translation multilingual name field for the FieldChoice model.

susanodd commented 2 months ago

I revised the code to use iexact. There are much fewer field choices that don't match. But this one needs to be changed:

@rosestamp: For DUCK (42719), could not find option next to for Relation between Articulators (There are about 800 rows with this error.)

There is a field "Next-to" as a choice. It requires the hyphen. (You can use next-to now, without the cap. But the code is not yet live.)

I'll put the revision up asap.

susanodd commented 2 months ago

@rosestamp there are also rows that update the same gloss. There should only be one row per gloss ID. (This is to prevent problems with conflicting updates in different rows.) You can sort the spreadsheet by Signbank ID to detect these.

susanodd commented 2 months ago

@rosestamp here's another one:

For CORRECT (42658), could not find option downwards + contralateral for Movement Direction

This needs to be a > instead of a + to match.

You could ask @ocrasborn if your research needs this to be different.

uklomp commented 2 months ago

@susanodd those kind of changes can come to me now :) There happens to be a difference between the > and the + categories.

@rosestamp I will change downwards + contralateral/ipsilateral (which is a weird category anyway) to downwards + contralateral.

susanodd commented 2 months ago

@susanodd those kind of changes can come to me now :) There happens to be a difference between the > and the + categories.

@rosestamp I will change downwards + contralateral/ipsilateral (which is a weird category anyway) to downwards + contralateral.

Great! I have no idea what symbols are syntax or have semantics. Thanks. There are a few more that didn't match any field choices.

susanodd commented 2 months ago

@rosestamp the CSV import is now case insensitive for the choice fields. So your file will give far fewer feedback errors. Now only if no match is found, as for the examples above.

susanodd commented 2 months ago

@uklomp are there other fields where the syntax of the choice can vary? For example, the Next-to above. If the - is only syntax and some people don't use it, I can code it so it also looks that up. (To match Next-to as well as Next to.) I modified the code so it's case insensitive for the choices now. What about the use of _ in the names? Could that also be used with a space instead?

uklomp commented 2 months ago

next-to and next to would be the same indeed. the ">"and + and / are not interchangeable in most cases. For the rest, I can't think of any examples where it matters. the underscore in names also doesnt seem very important, but which names do you mean? names of the fields?

susanodd commented 2 months ago

next-to and next to would be the same indeed. the ">"and + and / are not interchangeable in most cases. For the rest, I can't think of any examples where it matters. the underscore in names also doesnt seem very important, but which names do you mean? names of the fields?

Like in the choices for e.g., Strong Hand:

1_curved
Baby_beak

...

Do researchers use any other notation for the _ ?

uklomp commented 2 months ago

Ah ok, these could be spaces indeed. It's interchangeable.

From: susanodd @.> Sent: woensdag 17 april 2024 13:07 To: Signbank/Global-signbank @.> Cc: Ulrika Klomp @.>; Mention @.> Subject: Re: [Signbank/Global-signbank] how to change phonology/morphology on SignBank (Issue #1219)

next-to and next to would be the same indeed. the ">"and + and / are not interchangeable in most cases. For the rest, I can't think of any examples where it matters. the underscore in names also doesnt seem very important, but which names do you mean? names of the fields?

Like in the choices for e.g., Strong Hand:

1_curved

Baby_beak

...

- Reply to this email directly, view it on GitHubhttps://github.com/Signbank/Global-signbank/issues/1219#issuecomment-2061003761, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BDCXULFUXWOQGBQQYA7N3KTY5ZJUPAVCNFSM6AAAAABGFYOZQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRGAYDGNZWGE. You are receiving this because you were mentioned.Message ID: @.***>

rosestamp commented 2 months ago

Thank you. I updated 'next to' to 'next-to'. I understood that the other changes were made but maybe I missed something? When I enter it now, there are still multiple errors saying that the options for example "fingertips" for "location" are not found. Is there a more general problem I am missing? For the duplicate rows, I removed them. Here is my file and here is the screenshot when I tried to upload it: new_ISL_lemma_updates.1e.csv

Screenshot 2024-04-22 at 12 14 33
susanodd commented 2 months ago

If you look at the page for uploading, there is a scroll bar where it shows a pull-down list of choices for each field. https://signbank.cls.ru.nl/signs/import_csv_update/ If you see that the syntax is different (as for the example above with the plus sign), then @uklomp can change that or add a choice.

If the choices are in the pull-down, then it could be something with extra spaces or no spaces around the symbols? (I will check this.)

If there are more than one that match, then that needs to be corrected in the system. (The names should be unique. But it could be that we didn't notice there are duplicates.)

If none of those are the case, then there is something going on with the query search. (That would be a bug. There are choices where some are prefixes of others. So it could be that a prefix matches or something and it returns multiple instead of a unique result. It needs to obtain a unique choice.)

The example choice lists are not sorted alphabetically, so this is also not good. (I'll fix that.)

susanodd commented 2 months ago

@rosestamp @uklomp there is no choice 2n for Handedness. there is no choice U for Handshape (Strong Hand and Weak Hand) there is no choice X for Handshape, but there is choice X for Handedness

It's Location Weak hand: finger tips (not fingertips) It's Weak Hand C2_closed (not C2-closed) Weak Hand 1_curved(not 1-curved)

susanodd commented 2 months ago

@rosestamp another place you can see the existing choices for fields is on the Analysis > Frequencies page. They are sorted there.

It's Movement Direction Ipsilateral + up and down (not >) (gloss Spicy)

rosestamp commented 2 months ago

Can i just ask if spaces matter between words like 'upwards' and > or + etc? if yes, can this be changed?

susanodd commented 2 months ago

Can i just ask if spaces matter between words like 'upwards' and > or + etc? if yes, can this be changed?

It's because that's how they were defined when created by @ocrasborn.

I shall add some additional parsing to allow them without spaces. (There can only be one in the list of choices in the interface, in order to allow searching. So internally they will be mapped -- after parsing away/adding back the spaces for the particular operations + and > -- to the internal representation.) I can see it's quite annoying as it is.

Is this also the case for the _ that you also use a - for your research? I'm guessing you write them for publication in a certain way.

If you use a different interface language, you can also check what the translations look like for the field choices, to see if any of those are written differently in practice. (I can only read the English and Dutch.)

At the moment, the CSV uses English for the values. The API interface allows other languages now.

If you need operators themselves (the + and >) to be modified, @uklomp can do that. I'll do the spaces.

susanodd commented 2 months ago

@rosestamp I modified the code locally to also try to match the "+" and ">" with differing space.

But for these, the feedback about not matching, they really don't match. (Some don't exist. Some have a + instead of a > or vice verse.) Can you browse these and see if you need extra choices? Like e.g., U or 2n ? @uklomp can accommodate or discuss.

Import CSV Update Existing Glosses

For RABBIT (43043), could not find option U for Weak Hand

For BANANA1 (42539), could not find option motivated for Movement Shape

For PASTA (42997), could not find option U for Weak Hand

For GRAPE (42799), could not find option motivated for Movement Shape

For WINDOW (43288), could not find option motivated for Movement Shape

For BABY-CRIB (42530), could not find option ipsilateral > backwards for Movement Direction

For BIN (42559), could not find option upwards + forwards for Movement Direction

For SOAP (43136), could not find option proximal > distal for Movement Direction

For BASKET (42542), could not find option 2n for Handedness

For FLOWER (42769), could not find option upwards + forwards for Movement Direction

For NEIGHBOUR (42964), could not find option U for Weak Hand

For SLEEP (43127), could not find option U for Weak Hand

For WIPE (43291), could not find option proximal > distal for Movement Direction

For RIDE-ANIMAL (43069), could not find option U for Weak Hand

For LATER1 (42889), could not find option U for Weak Hand

For NEAR (42961), could not find option U for Weak Hand

For FIRE (42764), could not find option upwards/downwards for Movement Direction

For ANSWER (42514), could not find option U for Weak Hand

For SHOUT (43115), could not find option Ipsilateral + forwards for Movement Direction

For INSULTED (42858), could not find option upwards/downwards for Movement Direction

For IRON (42863), could not find option U for Weak Hand

For RETURN (44237), could not find option U for Weak Hand

For SPICY (43150), could not find option ipsilateral > up and down for Movement Direction

For DEODORANT (42689), could not find option ipsilateral and contralateral/downwards for Movement Direction

For EYEBROWS (42746), could not find option motivated for Movement Shape

For LECTURER (42894), could not find option unsure for Location

For SEWING-PIN (43101), could not find option backwards > upwards for Movement Direction

For WIG (43286), could not find option unsure for Location

For JUNE (42872), could not find option 2n for Handedness

For AUGUST (42524), could not find option 2n for Handedness

For SEPTEMBER (43099), could not find option 2n for Handedness

For SIP (43122), could not find option unsure for Location

For IMPOSSIBLE (42851), could not find option U for Weak Hand

For COME-CLOSER (42648), could not find option U for Weak Hand

For WRING (43296), could not find option forwards/backwards for Movement Direction

For IMPORTANT-NOT (42850), could not find option U for Weak Hand

For RUMOUR (43077), could not find option U for Weak Hand

For CHEERS (42621), could not find option contralateral > upwards for Movement Direction

For CHAIRPERSON (42615), could not find option unsure for Location

For REPRESENTATIVE (43062), could not find option 2n for Handedness

For DEPENDENT (42690), could not find option X for Weak Hand

For DIACRITICS (42693), could not find option motivated for Movement Shape
susanodd commented 2 months ago

FYI

input:       mouth>weak hand
normalised:  mouth > weak hand
input:       mouth>weak hand
normalised:  mouth > weak hand
input:       eye>neutral space
normalised:  eye > neutral space
input:       Chin>neutral space
normalised:  Chin > neutral space
input:       mouth>weak hand
normalised:  mouth > weak hand
input:       forehead>neutral space
normalised:  forehead > neutral space
uklomp commented 2 months ago

Hi @rosestamp. I can change or add options to the drop-down menus, but I'd like to do that only in cases where it is necessary, and not e.g. a mismatch with the available options. To go through the errors:

rosestamp commented 2 months ago

Thank you, I managed to solve all of the errors now so thank you for your help and for solving these issues.

uklomp commented 2 months ago

so, just to clarify, do I still need to look into the fields with > and + etc or did you find these as well? And should Susan still look into the 'motivated' form ?

rosestamp commented 2 months ago

sorry, i didn't manage to keep up with all of the questions...what is the question about > and +? motivated form?

susanodd commented 2 months ago

For this one,

For EYEBROWS (42746), could not find option motivated for Movement Shape

It should be "Motivated shape"

(You can see the choices in the Import CSV update example pull-downs. Those are computed dynamically when you view the page.)

uklomp commented 2 months ago

sorry, i didn't manage to keep up with all of the questions...what is the question about > and +? motivated form?

See my last message with the bullet point list. I went through all the errors and described if we needed to do something about it, or if you needed change the input in the fields. Then you said you managed to solve everything, and my question is if this means I don't need to check things like 'backwards > upwards' for movement direction anymore.

rosestamp commented 2 months ago

Thanks! so i think it's all resolved. Yes 'motivated' should have been 'motivated shape'. and yes, sometimes some < + combinations don't exist and if they don't, I guess they do need to be added. they are not interchangable. but it's possible that the combinations doesn't appear in NGT but does in ISL