Closed cneidle closed 5 months ago
When we do regenerate the spreadsheets, we should use the system information about class labels.
Up until now, I was manually editing the class label column. And, unfortunately, I'm discovering that there are some inconsistencies, due to the fact that some decisions got made along the way. For example, I was inconsistent in merging #DOG and DOG under the class label DOG. (sigh). There is also an inconsistency in whether HOME+WORK or HOMEWORK was used as the class label.
The main issue is that if we are going public with our current system, we probably don't want to make these changes to the system right away, because departing from the data that Yang used for training will introduce bigger problems...
There are also at least a few signs that we are in principle able to recognize, but which we do not seem to have represented on the DAI. We could get Carey to produce those signs, as we did with TRIANGLE, for example. These include, so far, SURGERY+AGENT.
There is an issue with the class labels for index signs (IX and POSS), which we should resolve when we make these other changes... most likely by changing some of the main entry labels, so that we don't need to rely on variant labels for determination of class labels.
FWIW, these are the changes that I think should be made with respect to existing spreadsheets.
4 occurrences in the DSP Sentences file, where the main entry #DOG should be assigned to the class label DOG (rather than #DOG). There are also 18 of these in the ASLLRP file.
There is one instance in the DSP signs spreadsheet and one in the DSP sentences spreadsheet where main entry LOBSTER should be assigned to class label CRAB/LOBSTER (rather than LOBSTER). There are also 8 instances of main entry LOBSTER_3 in the RIT signs file that should also be assigned to class label CRAB/LOBSTER.
There is one instance of main entry SIT+DCL:C\couch shaped object\"" in the DSP signs file, and 4 instances of SIT+DCL:crvd-U\couch shaped object\"" or SIT+DCL:crvd-B\couch shaped object\"" in the ASLLVD spreadsheet, which should be assigned to class label COUCH.
There are several instances of main entry TIME+DCL\circular object\"" (4 in the ASLLVD spreadsheet and 1 in the DSP signs spreadsheet) that should be assigned to class label CLOCK. There is also one in the ASLLRP spreadsheet.
There is 1 occurrence of WANT_3+NEG in the DSP sentences file that should be assigned to class label WANT+NEG.
There are 4 occurrences of (S)OLD+#FF that should have class label = OLD+#FF, 2 in ASLLVD and 2 in RIT signs.
I don't know why this looks different in the ASLLVD spreadsheet, as compared with the WLASL spreadsheet:
There may be other similar inconsistencies in the spreadsheets we are generating...
Screenshots from above
DSP sentences file - corrections
Class label should be DOG
=========
Class label should be CRAB/LOBSTER
=========
Class label should be WANT+NEG
Corrections to the DSP signs file
Class label should be CRAB/LOBSTER
===========
Class label should be COUCH
===========
Class label should be CLOCK
Corrections to RIT signs file
Class label should be CRAB/LOBSTER
============
Class label should be OLD+#FF
See also #470
Corrections to ASLLVD file
These should all have class label COUCH
======
These should all have class label CLOCK
=======
Class label should be OLD+#FF
ASLLRP corrections
Class label should be DOG
=======
Class label should be CLOCK
Hi Augustine,
Once we have made the corrections that have been under discussion, I think it would be a good idea to regenerate (and up the version number) on all the spreadsheets.
There have been significant changes.
If we do that, it may not be necessary to manually update the RIT spreadsheet, since the data online is correct...
But, for example, the changes we are making to SELF will have implications for the RIT spreadsheet, but not only that one!
[THIS IS NOW FIXED IN MY REVISED SPREADSHEET - CN]
E.g.
After you regenerate the spreadsheets, whenever that happens, as before, I will need to manually edit before we re-post.
Let's discuss when you have a chance... THANKS !!!