datacommonsorg / website

Code for the Data Commons website
https://datacommons.org
Apache License 2.0
21 stars 76 forks source link

NL: Tracking bug for alternate SV description improvements #2444

Open pradh opened 1 year ago

pradh commented 1 year ago

[Fixed]

Here is an example where Count_Person_BlackOrAfricanAmericanAlone could use better alternative description so it can be ranked higher.

image

pradh commented 1 year ago

I wonder if we should try to find these issues by constructing queries for which we should return 1PV SVs (from index), but we don't?

For something similar, in the past for search we generated queries like this: https://source.corp.google.com/piper///depot/google3/experimental/users/shanth/search/query_gen.py

CC @jehangiramjad

pradh commented 1 year ago

WAI: image


Another example is [how many students are in college in New Jersey] returning educational attainment SV.

pradh commented 1 year ago

Another example is [which countries produce the most greenhouse gasses] not returning the aggregate SV first.

https://screenshot.googleplex.com/6P3unbWAMENmH26

pradh commented 1 year ago

[Fixed]

Alternate strings for dc/topic/Jobs should be improved. For example:

[distribution of workers among industries in California] [distribution of jobs among industries in California] => These fail to show distribution

[distribution of jobs in California] => This one does (see below)

image

pradh commented 1 year ago

[Fixed]

Improve "air pollution" to map to concentration of air pollutants.

For example [which cities in Arkansas have the most pollution] only has AQI (which is reasonable), but should also include air pollutants.

pradh commented 1 year ago

[what are the wealthy San jose neighborhoods] -- income is on top, but the descriptions could perhaps use alternate ways of saying income

image

pradh commented 1 year ago

[fixed]

image


[unemployment europe] -- unemployment-rate is ranked so poorly and it has no alternatives

image

jehangiramjad commented 1 year ago

[hottest places in the bay area] -- should match something to do with the temperature SVs

Screenshot 2023-03-22 at 9 52 26 AM
jehangiramjad commented 1 year ago

[Fixed]

Another example:

[which are the most diverse cities in the US?] => The topic dc/topic/RacialBreakdown should match with a higher score and for that the description can be updated.

jehangiramjad commented 1 year ago

[Fixed]

Another example:

[how safe is new york city] => The SVs on crime should match but none of them show up in the top matched SVs.

jehangiramjad commented 1 year ago

Another example:

[what state has the best education in the US] => we should probably have a topic on "Education". In this case, a bunch of SVs and Topics match with education but perhaps things could be tightened up with a topic or more precise descriptions which match for "education"

lucy-kind commented 1 year ago

[Fixed]

Alternate strings for dc/topic/Jobs should be improved. For example:

[distribution of workers among industries in California] [distribution of jobs among industries in California] => These fail to show distribution

[distribution of jobs in California] => This one does (see below)

image

Strings have been added for this topic :)

pradh commented 1 year ago

[Closed: Removed SV since conflated with Count_Person_BelowPovertyLevelStatusInPast12Months which has more data]

Count_Person_PovertyStatusDetermined has incorrect description: image

pradh commented 1 year ago

[Fixed]

Count_Person_Female should have the word "Women" in the description. Because it doesn't and Count_Person_25OrMoreYears_EducationalAttainmentDoctorateDegree_Female does, a query for [women in california] returns women with a phd degree.

image

pradh commented 1 year ago

Reopening since this is a tracking issue with several more bug reports (as per above comments).

lucy-kind commented 1 year ago

new issue: population should be ranked above deaths for this query

image