Try to include Recall and Precision for R2 Analysis

jonfroehlich commented 5 years ago

Something like this (but didn't have a chance to finish it)

jonfroehlich commented 5 years ago

Dense tables!

jonfroehlich commented 5 years ago

In our 1:1 yesterday, you indicated that you didn't think you could get the increase/decrease text into the new table as we had before:

Probably true (though it would be nice to see a version with it).

We also chatted about doing cell coloring to make distinctions more noticeable (and forgoing the increase/decrease text).

galenweld commented 5 years ago

Yep! I'll be getting to this right now.

On Thu, Jul 18, 2019, 12:11 Jon Froehlich notifications@github.com wrote:

In our 1:1 yesterday, you indicated that you didn't think you could get the increase/decrease text into the new table as we had before:

[image: image] https://user-images.githubusercontent.com/1621749/61485052-1455ea80-a955-11e9-91b5-e436501f1044.png

Probably true (though it would be nice to see a version with it).

We also chatted about doing cell coloring to make distinctions more noticeable (and forgoing the increase/decrease text).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-cv-assets19/issues/34?email_source=notifications&email_token=AACXTSSZ7E477OUQPNO7BEDQAC555A5CNFSM4IEVVW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JPTMQ#issuecomment-512948658, or mute the thread https://github.com/notifications/unsubscribe-auth/AACXTSWX6CJNFACNUJZUI4TQAC555ANCNFSM4IEVVW4Q .

galenweld commented 5 years ago

I got both precision and recall into the table. Here's how it looks in the text at the moment:

I would love suggestions on the color, both the palette, and the color scheme to use. The above uses a color scheme to show the best, worst, and middle performing models for each label type and for each of precision, recall.

I experimented with two different options in my table generator: A) coloring strictly relative to performance increase/decrease from the baseline. This is probably my favorite:

B) coloring relative to the change from baseline, but further using different colors to emphasize further increase/decrease. Right now I used a crappy set of colors so it looks gross and busy, but this might work if we pick colors more carefully, however I need to go to sleep now ;)

Suggestions appreciated!

jonfroehlich commented 5 years ago

I like it. Now the text (story) just gets more complicated because there is not a clear trend towards improvement.

Sent from my iPhone

On Jul 19, 2019, at 12:40 AM, Galen Weld notifications@github.com wrote:

I got both precision and recall into the table. Here's how it looks in the text at the moment:

I would love suggestions on the color, both the palette, and the color scheme to use. The above uses a color scheme to show the best, worst, and middle performing models for each label type and for each of precision, recall.

I experimented with two different options in my table generator: A) coloring strictly relative to performance increase/decrease from the baseline. This is probably my favorite:

B) coloring relative to the change from baseline, but further using different colors to emphasize further increase/decrease. Right now I used a crappy set of colors so it looks gross and busy, but this might work if we pick colors more carefully, however I need to go to sleep now ;)

Suggestions appreciated!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

galenweld commented 5 years ago

Definitely – we get tradeoffs in recall and precision. I'm rewriting this section to be more inline with what our table shows.

galenweld commented 5 years ago

narrative now reads:

Overall, there are only marginal differences in performance: recall improves from 79.6% to 80.1% while precision drops from 80.3% to 79.7%; however, each label type is impacted differently. Missing curb ramps benefit the most, with recall increasing from 50.7% when using only image features to 51.8% when using all features, while at the same time precision also increases, from 80.2% to 80.6%. As missing curb ramps require the greatest knowledge of the broader street context, this result suggests that in some cases our positional and geographic features begin to capture this context. For other label types, impacts of the extra features are less clear. Curb ramp recall improves by 2.9 percentage points, while precision decreases by only 1.8 percentage points. Recall on surface problems jumps from 48.5% to 56.7%, an increase of 8.2 percentage points, while precision decreases 5.8 percentage points.

jonfroehlich commented 5 years ago

OK, I'll take a look in the paper and polish.

The larger issue for me is that we can't really present just recall in any of our results since recall and precision are intertwined and talking about one without the other isn't that useful imo (you can manipulate recall, for example, at a cost of precision). So, the question is: what should we do for the R3 and R4 results sub-sections.

galenweld commented 5 years ago

If we're concerned about it, then I guess I would suggest we change those figures to report F1 score instead of precision – it's too busy and dense to try and squeeze in both precision and recall in those sections, and we're already using recall as a single number to represent our performance, so it makes sense to use F1 score, which is arguably the best single number to represent performance.

If you'd like, I'll go ahead and redo those figures - shouldn't take long.

On Fri, Jul 19, 2019 at 11:15 AM Jon Froehlich notifications@github.com wrote:

OK, I'll take a look in the paper and polish.

The larger issue for me is that we can't really present just recall in any of our results since recall and precision are intertwined and talking about one without the other isn't that useful imo (you can manipulate recall, for example, at a cost of precision). So, the question is: what should we do for the R3 and R4 results sub-sections.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-cv-assets19/issues/34?email_source=notifications&email_token=AACXTSQ7UAUWWJIEXBHWUG3QAIADVA5CNFSM4IEVVW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2ML2YQ#issuecomment-513326434, or mute the thread https://github.com/notifications/unsubscribe-auth/AACXTSRBPMX53E3BARDLJVDQAIADVANCNFSM4IEVVW4Q .

jonfroehlich commented 5 years ago

I need to dive into this sections first myself before coming to a decision. I haven't read them in forever. Doing so now.

On Fri, Jul 19, 2019 at 11:23 AM Galen Weld notifications@github.com wrote:

If we're concerned about it, then I guess I would suggest we change those figures to report F1 score instead of precision – it's too busy and dense to try and squeeze in both precision and recall in those sections, and we're already using recall as a single number to represent our performance, so it makes sense to use F1 score, which is arguably the best single number to represent performance.

If you'd like, I'll go ahead and redo those figures - shouldn't take long.

On Fri, Jul 19, 2019 at 11:15 AM Jon Froehlich notifications@github.com wrote:

OK, I'll take a look in the paper and polish.

The larger issue for me is that we can't really present just recall in any of our results since recall and precision are intertwined and talking about one without the other isn't that useful imo (you can manipulate recall, for example, at a cost of precision). So, the question is: what should we do for the R3 and R4 results sub-sections.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/ProjectSidewalk/sidewalk-cv-assets19/issues/34?email_source=notifications&email_token=AACXTSQ7UAUWWJIEXBHWUG3QAIADVA5CNFSM4IEVVW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2ML2YQ#issuecomment-513326434 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACXTSRBPMX53E3BARDLJVDQAIADVANCNFSM4IEVVW4Q

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-cv-assets19/issues/34?email_source=notifications&email_token=AAML55L6K4EBLYKER36OPYTQAIBBXA5CNFSM4IEVVW42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MMPNA#issuecomment-513329076, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML55MTBB5XOH2VWG2I5V3QAIBBXANCNFSM4IEVVW4Q .

-- Jon Froehlich Associate Professor Paul G. Allen School of Computer Science & Engineering University of Washington http://makeabilitylab.io @jonfroehlich https://twitter.com/jonfroehlich - Twitter Help make sidewalks more accessible: http://projectsidewalk.io

ProjectSidewalk / sidewalk-cv-assets19

Try to include Recall and Precision for R2 Analysis #34