RTICWDT / college-scorecard

College Scorecard
https://collegescorecard.ed.gov/
312 stars 75 forks source link

As a Consumer, I can perform a Simple Search #1

Closed ultrasaurus closed 9 years ago

ultrasaurus commented 9 years ago

As a Consumer, I can perform a Simple Search using Name, Location (State, Region, Zip + Radius), and Size fields to view a resulting list of institutions matching my query. These results would show: 1) Name of School, 2) Location of School, 3) Type of Institution (2-year/4-year), 4) Undergraduate Population,

"From 2013derived_opeid6.csv fields: name=INSTNM_MAIN city=CITY_MAIN state=STABBR_MAIN zip=ZIP_MAIN region=REGION_MAIN type-of-institution=INSTCAT_MAIN size=UGDS (undergraduate degree-seeking enrollment number).

From a user experience (story acceptance) perspective, this includes these smaller stories

Alternatively, 2013derived_opeid.csv has the same variables broken down by branch (possibly use _FREQ instead of _MAIN). Data dictionary at https://docs.google.com/spreadsheets/d/1wIbBcnv-cMhmuF_8A1pFJ1vVi-W83PMayEz8hhBuKTY/edit#gid=802307363 (including interpretation for INSTCAT_MAIN field)"

Region is a number.. is there a mapping to words somewhere? DN: The data dictionary at https://docs.google.com/spreadsheets/d/1wIbBcnv-cMhmuF_8A1pFJ1vVi-W83PMayEz8hhBuKTY/edit#gid=802307363 includes an interpretation for this column.

shawnbot commented 9 years ago

I think the region IDs correspond to this map:

image

shawnbot commented 9 years ago

@dnesting I haven't been able to find any mappings of numeric IDs (1-8) in the spreadsheet for these regions. Do you have anything?

shawnbot commented 9 years ago

I dug into the source of this page and found the following, which suggests some very different regional definitions from the map above:

<div id="ui1_divControlsList">
<div><input type="checkbox" value="0_DMN"><label>US Service schools</label></div>
<div><input type="checkbox" value="1_DMN"><label>New England CT ME MA NH RI VT</label></div>
<div><input type="checkbox" value="2_DMN"><label>Mid East DE DC MD NJ NY PA</label></div>
<div><input type="checkbox" value="3_DMN"><label>Great Lakes IL IN MI OH WI</label></div>
<div><input type="checkbox" value="4_DMN"><label>Plains IA KS MN MO NE ND SD</label></div>
<div><input type="checkbox" value="5_DMN"><label>Southeast AL AR FL GA KY LA MS NC SC TN VA WV</label></div>
<div><input type="checkbox" value="6_DMN"><label>Southwest AZ NM OK TX</label></div>
<div><input type="checkbox" value="7_DMN"><label>Rocky Mountains CO ID MT UT WY</label></div>
<div><input type="checkbox" value="8_DMN"><label>Far West AK CA HI NV OR WA</label></div>
<div><input type="checkbox" value="9_DMN"><label>Outlying areas AS FM GU MH MP PR PW VI</label></div>
</div>

I've dropped these values into a CSV that drives the search form's region selector. It'd be really great to have somebody from Ed tell us what the regions really are, though! cc @LisaGee

In the absence of an official mapping, we'll need to cross-reference some of these with the actual values in the spreadsheet to figure out which one is correct. I think the HTML version above is probably not right because that has 9 regions, whereas the data we've got only includes 8.

LisaGee commented 9 years ago

Following up on this now. Will get back to you when I have an answer.

But I have one question for you @shawnbot. You say “the data we've got only includes 8.” Which data are you referring to? Want to make sure I’m looking at exactly the same thing you are.

Thanks,

-Lisa

From: Shawn Allen [mailto:notifications@github.com] Sent: Tuesday, June 16, 2015 8:06 PM To: 18F/college-choice Cc: Gelobter, Lisa Subject: Re: [college-choice] As a Consumer, I can perform a Simple Search (#1)

I dug into the source of this pagehttp://nces.ed.gov/ipeds/datacenter/QueryForm.aspx and found the following, which suggests some very different regional definitions from the map above:

I've dropped these values into a CSVhttps://github.com/18F/college-choice/blob/37afd896af46cd27861a454a8e4cca3628eb876b/_data/regions.csv that drives the search form's region selector. It'd be really great to have somebody from Ed tell us what the regions really are, though! cc @LisaGeehttps://github.com/LisaGee

In the absence of an official mapping, we'll need to cross-reference some of these with the actual values in the spreadsheet to figure out which one is correct. I think the HTML version above is probably not right because that has 9 regions, whereas the data we've got only includes 8.

— Reply to this email directly or view it on GitHubhttps://github.com/18F/college-choice/issues/1#issuecomment-112607736.

LisaGee commented 9 years ago

And here is the answer from ED. @shawnbot Where are you only seeing 8?

From: Shi, Lena Sent: Wednesday, June 17, 2015 3:52 PM To: McCann, Clare; Gelobter, Lisa; Muenzer, Melanie; Reeves, Richard; Matsudaira, Jordan Cc: Bell-Ellwanger, Jenn; Meyer, Erie; Teal, Jessica; Nesting, David Subject: RE: Definition of Regions?

Hi all,

Here’s what we’ve been using:

Variable name: Region Value

Label

0

US Service schools

1

New England CT ME MA NH RI VT

2

Mid East DE DC MD NJ NY PA

3

Great Lakes IL IN MI OH WI

4

Plains IA KS MN MO NE ND SD

5

Southeast AL AR FL GA KY LA MS NC SC TN VA WV

6

Southwest AZ NM OK TX

7

Rocky Mountains CO ID MT UT WY

8

Far West AK CA HI NV OR WA

9

Outlying areas AS FM GU MH MP PR PW VI

In case you need them written out, here’s language from the 2011-12 IPEDS Methodology report:

The New England region includes Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont. The Mid East region includes Delaware, the District of Columbia, Maryland, New Jersey, New York, and Pennsylvania. The Great Lakes region includes Illinois, Indiana, Michigan, Ohio, and Wisconsin. The Plains region includes Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota. The Southeast region includes Alabama, Arkansas, Florida, Georgia, Kentucky, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Virginia, and West Virginia. The Southwest region includes Arizona, New Mexico, Oklahoma, and Texas. The Rocky Mountains region includes Colorado, Idaho, Montana, Utah, and Wyoming. The Far West region includes Alaska, California, Hawaii, Nevada, Oregon, and Washington. The five U.S. service academies are the U.S. Naval Academy, the U.S. Military Academy, the U.S. Coast Guard Academy, the U.S. Air Force Academy, and the U.S. Merchant Marine Academy. The other U.S. jurisdictions include American Samoa, the Federated States of Micronesia, Guam, the Marshall Islands, the Commonwealth of the Northern Mariana Islands, Palau, Puerto Rico, and the U.S. Virgin Islands.

Hope this helps. Thanks!

Lena

From: Gelobter, Lisa Sent: Wednesday, June 17, 2015 3:43 PM To: '18F/college-choice'; 18F/college-choice Subject: RE: [college-choice] As a Consumer, I can perform a Simple Search (#1)

Following up on this now. Will get back to you when I have an answer.

But I have one question for you @shawnbot. You say “the data we've got only includes 8.” Which data are you referring to? Want to make sure I’m looking at exactly the same thing you are.

Thanks,

-Lisa

From: Shawn Allen [mailto:notifications@github.com] Sent: Tuesday, June 16, 2015 8:06 PM To: 18F/college-choice Cc: Gelobter, Lisa Subject: Re: [college-choice] As a Consumer, I can perform a Simple Search (#1)

I dug into the source of this pagehttp://nces.ed.gov/ipeds/datacenter/QueryForm.aspx and found the following, which suggests some very different regional definitions from the map above:

I've dropped these values into a CSVhttps://github.com/18F/college-choice/blob/37afd896af46cd27861a454a8e4cca3628eb876b/_data/regions.csv that drives the search form's region selector. It'd be really great to have somebody from Ed tell us what the regions really are, though! cc @LisaGeehttps://github.com/LisaGee

In the absence of an official mapping, we'll need to cross-reference some of these with the actual values in the spreadsheet to figure out which one is correct. I think the HTML version above is probably not right because that has 9 regions, whereas the data we've got only includes 8.

— Reply to this email directly or view it on GitHubhttps://github.com/18F/college-choice/issues/1#issuecomment-112607736.

ultrasaurus commented 9 years ago

proposed mapping of column name to JSON field (this will be configurable in the API, but would be good to just decide). I hear @ErieMeyer is advocating "school" over "institution" and we would like to follow 18F API standards using underscore (_) for multi-word fields:

  INSTNM_MAIN: name
  CITY_MAIN: city
  STABBR_MAIN: state
  ZIP_MAIN: zip
  REGION_MAIN: region
  INSTCAT_MAIN: school_type
  UGDS: size
shawnbot commented 9 years ago

@LisaGee I was only seeing 1-8 in that column as I scrolled through one of the giant CSVs, so it was definitely an informal survey. We should probably confirm that we're getting 0s and 9s in the data though, right?

ultrasaurus commented 9 years ago

@LisaGee thanks for providing the region mapping! @shawnbot and I were thinking that it might be nice to have the api report strings instead of numbers, something like:

'US Service'
'New England'
'Mid East'
'Great Lakes'
'Plains'
'Southeast'
'Southwest'
'Rocky Mountains'
'Far West'
'Outlying Areas'
shawnbot commented 9 years ago

How about region_name ("New England") and region_id (1)? I'm thinking that some API consumers might have those region ID mappings to hand already, and may find it more useful to just use them rather than the names.

diego- commented 9 years ago

please break up search tasks into separate tickets; then close this ticket as duplicate.

per sprint planning mtg., july 27, 2015

ErieMeyer commented 9 years ago

This is fixed!

Found one new bug that's unrelated though: https://github.com/18F/college-choice/issues/1083