Using auto-validations to help with user quality inference

jonfroehlich commented 5 years ago

I'd like us to investigate how we might be able to incorporate the auto-validator CV algorithm in helping us predict performance.

(Also, how many labels per user are necessary before the auto-validator becomes useful. Somewhat related to https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/27.)

nch0w commented 5 years ago

The CV confidence ranges from 0-100. The higher the confidence, the more the CV model thinks its prediction is correct.

We tried to use CV confidence for each label to predict whether it is correct.

Screenshot from 2019-08-01 14-54-51 Each plot has two histograms, one for the CV confidence of correct labels, and one for the CV confidence of incorrect labels. We can see that if a CurbRamp label has a CV confidence < 40, then it is probably incorrect, for example.

jonfroehlich commented 5 years ago

Can you provide a more in-depth summary of what you found in this analysis and the implications for us?

On Thu, Aug 1, 2019 at 2:58 PM Neil Chowdhury notifications@github.com wrote:

The CV confidence ranges from 0-100. The higher the confidence, the more the CV model thinks its prediction is correct.

We tried to use CV confidence for each label to predict whether it is correct.

[image: Screenshot from 2019-08-01 14-54-51] https://user-images.githubusercontent.com/17211794/62330097-80b00e00-b46c-11e9-9829-4bbb973cea03.png Each plot has two histograms, one for the CV confidence of correct labels, and one for the CV confidence of incorrect labels. We can see that if a CurbRamp label has a CV confidence < 40, then it is probably incorrect, for example.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/29?email_source=notifications&email_token=AAML55NBI6TEO2XXDBOUXRDQCNL7FA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MAFMQ#issuecomment-517472946, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55MGRY7FHAF5OILIPBLQCNL7FANCNFSM4IE7UPSQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

We also predict that if the CV label type matches the user label type, then the label is probably correct.

The rows represent CV labels, the columns represent user labels, and the values represent the probability that a label with that specific CV label and user label is correct.

          CR          NCR          O           SP
CR:  [0.93353028, 0.95294118, 0.9118541 , 0.89325843],
NCR: [0.87033748, 0.91358025, 0.9109589 , 0.89855072],
O:   [0.63453815, 0.5483871 , 0.59813084, 0.66071429],
SP:  [0.69811321, 0.6875    , 0.69662921, 0.74647887]]

jonfroehlich commented 5 years ago

I'm still not getting a sense of how this is useful. Can you write up a ~1-2 paragraph summary of your findings to complement the numbers. Can you articulate: what you found and how this is useful?

On Thu, Aug 1, 2019 at 3:29 PM Neil Chowdhury notifications@github.com wrote:

We also predict that if the CV label type matches the user label type, then the label is probably correct.

The rows represent CV labels, the columns represent user labels, and the values represent the probability that a label with that specific CV label and user label is correct.
      CR          NCR          O           SP
CR: [0.93353028, 0.95294118, 0.9118541 , 0.89325843], NCR: [0.87033748, 0.91358025, 0.9109589 , 0.89855072], O: [0.63453815, 0.5483871 , 0.59813084, 0.66071429], SP: [0.69811321, 0.6875 , 0.69662921, 0.74647887]]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/29?email_source=notifications&email_token=AAML55OC5U6RXGUM3EWTRFLQCNPVLA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3MCDEQ#issuecomment-517480850, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55OAV3BYWYKESWBIHITQCNPVLANCNFSM4IE7UPSQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

I found that CV predictions are not reliable for predicting the accuracy of a label. The histograms show that there is not much correlation between CV confidence and the accuracy of a label. We also expected that if the CV agrees with human's label, then the label is more likely to be accurate, but as shown in the table, this is not true.

The CV model as it stands must be refined before it is useful for auto-validations.

jonfroehlich commented 5 years ago

What CV model are you using? Your investigations would depend significantly on which ML model was used and how it was trained. Also, isn't this finding far more nuanced than your description implies in that the CV model performs differently depending on label type (e.g., it's far more accurate for curb ramp labels).

On Fri, Aug 2, 2019 at 9:46 AM Neil Chowdhury notifications@github.com wrote:

I found that CV predictions are not reliable for predicting the accuracy of a label. The histograms show that there is not much correlation between CV confidence and the accuracy of a label. We also expected that if the CV agrees with human's label, then the label is more likely to be accurate, but as shown in the table, this is not true.

The CV model as it stands must be refined before it is useful for auto-validations.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/29?email_source=notifications&email_token=AAML55ISOXIO75NI2LAED7LQCRQETA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3OIU5Q#issuecomment-517769846, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55NFWE3RITWG676PCDLQCRQETANCNFSM4IE7UPSQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

Yes, it could be useful for predicting the accuracy of CurbRamp labels. But keep in mind that 92.5% of CurbRamp labels are correct anyways.

We used the DC model.

nch0w commented 5 years ago

Here are plots updated with new predictions from Devesh.

Screenshot from 2019-08-12 10-13-25

          CR          NCR          O           SP
CR:  [0.91929825, 0.90588235, 0.89781022, 0.88343558],
NCR: [0.82674772, 0.88870432, 0.90204082, 0.86631016],
O:   [0.53398058, 0.504,      0.50246305, 0.56],
SP:  [0.64705882, 0.63694268, 0.7,        0.6 ]]

nch0w commented 5 years ago

According to the plots, I don't think CV is very useful for predicting user accuracy yet.

jonfroehlich commented 5 years ago

That surprises me. I think you should meet with Galen and discuss your results.

On Mon, Aug 12, 2019 at 10:19 AM Neil Chowdhury notifications@github.com wrote:

According to the plots, I don't think CV is very useful for predicting user accuracy yet.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-quality-analysis/issues/29?email_source=notifications&email_token=AAML55OTTDIRJF5J3SFT3UDQEGLRRA5CNFSM4IE7UPS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4DG5WY#issuecomment-520515291, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55MWMBVA4SLALXKKG2DQEGLRRANCNFSM4IE7UPSQ .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

nch0w commented 5 years ago

FYI, if we want to use CV to predict user accuracy, we will also need to run it on all the labels. I only have predictions for ~4,000 labels out of the 65,700 total labels.

ProjectSidewalk / sidewalk-quality-analysis

Using auto-validations to help with user quality inference #29