dwhieb commented 3 months ago

Process

Whenever there's an exact match between a token gloss and one of the token definitions in Column B, add the tag in Column A to the component's tags list.
Any tag that's in ALL CAPS should be represented using small caps. (Include a + in the regex.)
- const capsRegExp = /^([A-Z]|\s)+$/v < update this

To Do

[x] Alphabetize tags for each component.
[x] Only add tags when gloss, type, and subtype all match.
[x] Display lexical tags before grammatical tags.
[x] Process latest tags in "Tags_for_Components" spreadsheet.

dwhieb commented 3 months ago

Initial script to add tags to components is written, tested, and working fine.

Waiting for @monicamacaulay to spot-check these cases and finish adding tags to other components before processing the remaining tags.

monicamacaulay commented 3 months ago

I looked at #174 but I don't see what I'm supposed to spot-check.

Clarification on this:

Whenever there's an exact match between a token gloss and one of the token definitions in Column B, add the tag in Column A to the component's tags list.
"one of the token definitions" - that's ambiguous and I just wanted to make sure you meant it the right way - it should match a whole token definition - e.g. if the definition is "appear, seem" it has to match "appear, seem". Right?

Also, question: "Applying tags will be a one-time process." - does that mean we can't continue to make changes? Because I foresee the fine-tuning taking a long time. We can get the majority done relatively fast but then filling in the blanks is going to take a while.

Monica

dwhieb commented 3 months ago

@monicamacaulay

I looked at #174 but I don't see what I'm supposed to spot-check.

I was referring to the data spreadsheets, and I guess also the actual website since the data is there now too. I recommend taking a look at some of the initials that have had tags added (still as "definitions" for the time being) to see whether they came out as you expected.

"one of the token definitions" - that's ambiguous and I just wanted to make sure you meant it the right way - it should match a whole token definition - e.g. if the definition is "appear, seem" it has to match "appear, seem". Right?

Correct. There had to be an exact match for the entire definition in order for the tag to be applied.

Also, question: "Applying tags will be a one-time process." - does that mean we can't continue to make changes? Because I foresee the fine-tuning taking a long time. We can get the majority done relatively fast but then filling in the blanks is going to take a while.

I just mean that the script to apply the tags has to be run manually on my end. It isn't part of the normal deployment workflow. But I can run the script as many times as needed.

See my other email for a follow-up question about this too.

dwhieb commented 3 months ago

Information from @monicamacaulay on how to process the tags spreadsheet:

Okay, there's a spreadsheet called "Tags_for_Components" (NOT to be confused with "Tags_for_Components_Working_Copy"). It's in Data Entry, Data Sources > Tags for Entries.

It has four columns: Tag(s), Token Definitions - Do Not Change (I left the cautionary note on for the heck of it), Type, and Subtype. I did freeze the header row; don't know if that affects your program. And I realize you may want to switch the order of columns around but I just left them as is. Or retitle them. Whatever.

There can be blanks in any of the four columns: Tag or Token Definition could be blank, Subtype is usually blank (only filled for finals, but occasionally not even for those), and Type unfortunately can also be blank, for when sources gave components with no type (damn them!).

Grammatical tags are in caps, and those'll show up in small caps, right? They also all come after all lexical tags.

I did this first version without Hunter's input, so when we have a chance to talk he may want some changes. I know I was arbitrary in many cases. And the difference between instrument and manner made my head explode so I'm sure it's inconsistent. But either we can make some changes and ask you to run it again, or we can make the changes to the spreadsheets.

One change I know I'm going to want to make is that I think each set of tag types (lexical and grammatical) should be alphabetical (within their type). All the grammatical ones are, but only some of the lexical ones are because I only thought of it kind of late in the game. Actually the reason I thought of it was that I did a search after you had entered just the initials and it was really salient that components with the same definitions were coming up with those definitions in either order. Bothered the heck out of me. So I'll work on that.

dwhieb commented 3 months ago

@monicamacaulay do you only want the tags applied when all three of these conditions are met?

The glosses match.
The component type matches.
The subtype matches.

Or do I only need to match on glosses?

monicamacaulay commented 3 months ago

All three. You can get different grammatical tags depending on component type.

dwhieb / Nisinoon

Tags #174

Process

To Do