DCS-LCSR / ASL-DAI

1 stars 0 forks source link

Time to re-generate spreadsheets ??? #471

Closed cneidle closed 3 months ago

cneidle commented 3 months ago

Hi Augustine,

Once we have made the corrections that have been under discussion, I think it would be a good idea to regenerate (and up the version number) on all the spreadsheets.

There have been significant changes.

If we do that, it may not be necessary to manually update the RIT spreadsheet, since the data online is correct...

But, for example, the changes we are making to SELF will have implications for the RIT spreadsheet, but not only that one!

[THIS IS NOW FIXED IN MY REVISED SPREADSHEET - CN]

E.g.

Screen Shot 2024-06-23 at 12 46 37 PM

After you regenerate the spreadsheets, whenever that happens, as before, I will need to manually edit before we re-post.

Let's discuss when you have a chance... THANKS !!!

cneidle commented 3 months ago

When we do regenerate the spreadsheets, we should use the system information about class labels.

Up until now, I was manually editing the class label column. And, unfortunately, I'm discovering that there are some inconsistencies, due to the fact that some decisions got made along the way. For example, I was inconsistent in merging #DOG and DOG under the class label DOG. (sigh). There is also an inconsistency in whether HOME+WORK or HOMEWORK was used as the class label.

The main issue is that if we are going public with our current system, we probably don't want to make these changes to the system right away, because departing from the data that Yang used for training will introduce bigger problems...

There are also at least a few signs that we are in principle able to recognize, but which we do not seem to have represented on the DAI. We could get Carey to produce those signs, as we did with TRIANGLE, for example. These include, so far, SURGERY+AGENT.

There is an issue with the class labels for index signs (IX and POSS), which we should resolve when we make these other changes... most likely by changing some of the main entry labels, so that we don't need to rely on variant labels for determination of class labels.

cneidle commented 3 months ago

FWIW, these are the changes that I think should be made with respect to existing spreadsheets.

4 occurrences in the DSP Sentences file, where the main entry #DOG should be assigned to the class label DOG (rather than #DOG). There are also 18 of these in the ASLLRP file.

There is one instance in the DSP signs spreadsheet and one in the DSP sentences spreadsheet where main entry LOBSTER should be assigned to class label CRAB/LOBSTER (rather than LOBSTER). There are also 8 instances of main entry LOBSTER_3 in the RIT signs file that should also be assigned to class label CRAB/LOBSTER.

There is one instance of main entry SIT+DCL:C\couch shaped object\"" in the DSP signs file, and 4 instances of SIT+DCL:crvd-U\couch shaped object\"" or SIT+DCL:crvd-B\couch shaped object\"" in the ASLLVD spreadsheet, which should be assigned to class label COUCH.

There are several instances of main entry TIME+DCL\circular object\"" (4 in the ASLLVD spreadsheet and 1 in the DSP signs spreadsheet) that should be assigned to class label CLOCK. There is also one in the ASLLRP spreadsheet.

There is 1 occurrence of WANT_3+NEG in the DSP sentences file that should be assigned to class label WANT+NEG.

There are 4 occurrences of (S)OLD+#FF that should have class label = OLD+#FF, 2 in ASLLVD and 2 in RIT signs.

I don't know why this looks different in the ASLLVD spreadsheet, as compared with the WLASL spreadsheet:

Screen Shot 2024-06-23 at 7 27 13 PM

There may be other similar inconsistencies in the spreadsheets we are generating...

cneidle commented 3 months ago

Screenshots from above

DSP sentences file - corrections

Class label should be DOG

Screen Shot 2024-06-23 at 7 32 02 PM

=========

Class label should be CRAB/LOBSTER

Screen Shot 2024-06-23 at 7 34 09 PM

=========

Class label should be WANT+NEG

Screen Shot 2024-06-23 at 7 38 04 PM
cneidle commented 3 months ago

Corrections to the DSP signs file

Class label should be CRAB/LOBSTER

Screen Shot 2024-06-23 at 7 45 34 PM

===========

Class label should be COUCH

Screen Shot 2024-06-23 at 7 47 17 PM

===========

Class label should be CLOCK

Screen Shot 2024-06-23 at 7 48 24 PM
cneidle commented 3 months ago

Corrections to RIT signs file

Class label should be CRAB/LOBSTER

Screen Shot 2024-06-23 at 7 54 39 PM

============

Class label should be OLD+#FF

Screen Shot 2024-06-24 at 9 52 36 AM

See also #470

cneidle commented 3 months ago

Corrections to ASLLVD file

These should all have class label COUCH

Screen Shot 2024-06-23 at 8 01 03 PM

======

These should all have class label CLOCK

Screen Shot 2024-06-23 at 8 04 07 PM

=======

Class label should be OLD+#FF

Screen Shot 2024-06-24 at 9 55 13 AM
cneidle commented 3 months ago

ASLLRP corrections

Class label should be DOG

Screen Shot 2024-06-23 at 8 11 36 PM

=======

Class label should be CLOCK

Screen Shot 2024-06-25 at 3 07 31 PM