UW-Madison-DSI / ospo-stats

1 stars 0 forks source link

Confusion matrix for search terms #5

Closed JasonLo closed 4 months ago

JasonLo commented 4 months ago

@cranmer had suggested creating a confusion matrix table comparing different search terms.

The current pipeline did not append this information in the main DB, we need a re-crawl (<30 mins), for the current dataset (n~=6k).

JasonLo commented 4 months ago

Not exactly confusion matrix, but I guess showing similar info? list of keywords are: ["uw-madison", "madison", "wisconsin", "wisc.edu"]

image

JasonLo commented 4 months ago

@cranmer

Updated to what you want. Confusion matrix, defined as count of intersection

Image

Divided by row total

Image

cranmer commented 4 months ago

Nice!

Could you add one more row /column that is for any keyword?

From: Jason Lo @.> Date: Thursday, March 21, 2024 at 9:44 AM To: UW-Madison-DSI/ospo-stats @.> Cc: Kyle Stuart Cranmer @.>, Mention @.> Subject: Re: [UW-Madison-DSI/ospo-stats] Confusion matrix for search terms (Issue #5)

@cranmerhttps://github.com/cranmer

Updated to what you want. Confusion matrix, defined as count of intersection

image.png (view on web)https://github.com/UW-Madison-DSI/ospo-stats/assets/1927986/587b59d3-e039-4c38-bf52-08a285c1e5f6

Divided by row total

image.png (view on web)https://github.com/UW-Madison-DSI/ospo-stats/assets/1927986/d39efc1d-c6be-4383-8bd7-760141d6ba54

— Reply to this email directly, view it on GitHubhttps://github.com/UW-Madison-DSI/ospo-stats/issues/5#issuecomment-2012486168, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABCATCWWBHUV75CEVVNC4K3YZLW37AVCNFSM6AAAAABE7SWVASVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJSGQ4DMMJWHA. You are receiving this because you were mentioned.Message ID: @.***>

JasonLo commented 4 months ago

In short yes, in details, my plan is:

  1. Each time a crawl begins, the data pipeline will transfer the initial keyword into the primary database.
  2. Implement a convenient function to generate a confusion matrix, providing insight into the database's current state.
  3. Although it might not be essential, creating a web application to display the confusion matrix and various charts could be beneficial.

Thoughts?

JasonLo commented 4 months ago

f8727fe10ac87ee5b01e8384465d471fd53f48de

image