cshjin / CS412Yelp

1 stars 3 forks source link

Dataset clean #2

Closed cshjin closed 8 years ago

cshjin commented 8 years ago
cshjin commented 8 years ago

Duplicated columns in fit_transform:

u'Accepts Insurance'
u'Ages Allowed'
u'Ages Allowed=18plus'
u'Ages Allowed=19plus'
u'Ages Allowed=21plus'
u'Ages Allowed=allages'
u'Alcohol'
u'Alcohol=beer_and_wine'
u'Alcohol=full_bar'
u'Alcohol=none'
u'Attire'
u'Attire=casual'
u'Attire=dressy'
u'Attire=formal'
u'BYOB'
u'BYOB/Corkage'
u'BYOB/Corkage=no'
u'BYOB/Corkage=yes_corkage'
u'BYOB/Corkage=yes_free'
u'By Appointment Only'
u'Caters'
u'Coat Check'
u'Corkage'
u'Delivery'
u'Dogs Allowed'
u'Drive-Thru'
u'Good For Dancing'
u'Good For Groups'
u'Good For Kids'
u'Good for Kids'
u'Happy Hour'
u'Has TV'
u'Noise Level'
u'Noise Level=average'
u'Noise Level=loud'
u'Noise Level=quiet'
u'Noise Level=very_loud'
u'Open 24 Hours'
u'Order at Counter'
u'Outdoor Seating'
u'Price Range'
u'Smoking'
u'Smoking=no'
u'Smoking=outdoor'
u'Smoking=yes'
u'Take-out'
u'Takes Reservations'
u'Waiter Service'
u'Wheelchair Accessible'
u'Wi-Fi'
u'Wi-Fi=free'
u'Wi-Fi=no'
u'Wi-Fi=paid'
u'attr_casual'
u'attr_classy'
u'attr_divey'
u'attr_hipster'
u'attr_intimate'
u'attr_romantic'
u'attr_touristy'
u'attr_trendy'
u'attr_upscale'
u'average_stars'
u'city=Ahwatukee'
u'city=Allentown'
u'city=Anjou'
u'city=Anthem'
u'city=Apache Junction'
u'city=Arlington'
u'city=Aspinwall'
u'city=Avondale'
u"city=Baie-D'urfe"
u'city=Balerno'
u'city=Beaconsfield'
u'city=Bellevue'
u'city=Bellvue'
u'city=Belmont'
u'city=Black Canyon City'
u'city=Blainville'
u'city=Bloomfield'
u'city=Bocholt'
u'city=Boisbriand'
u'city=Bonnyrigg and Lasswade'
u'city=Boulder City'
u'city=Braddock'
u'city=Brentwood'
u'city=Bridgeville'
u'city=Brossard'
u'city=Buckeye'
u'city=Cambridge'
u'city=Carefree'
u'city=Carnegie'
u'city=Casa Grande'
u'city=Castle Shannon'
u'city=Cave Creek'
u'city=Central City Village'
u'city=Centropolis Laval'
u'city=Champaign'
u'city=Chandler'
u'city=Charlotte'
u'city=Chateau'
u'city=City of Edinburgh'
u'city=Clark County'
u'city=Clover'
u'city=Communaut\xe9-Urbaine-de-Montr\xe9al'
u'city=Concord'
u'city=Concord Mills'
u'city=Conestogo'
u'city=Coolidge'
u'city=Cote-Saint-Luc'
u'city=Cote-des-Neiges-Notre-Dame-de-Grace'
u'city=Cottage Grove'
u'city=Crafton'
u'city=Cramond Bridge'
u'city=Dalgety Bay'
u'city=Dalkeith'
u'city=Dane'
u'city=De Forest'
u'city=DeForest'
u'city=Delmont'
u'city=Deux-Montagnes'
u'city=Dollard-Des Ormeaux'
u'city=Dollard-Des-Ormeaux'
u'city=Dollard-des-Ormeaux'
u'city=Dormont'
u'city=Dorval'
u'city=Downtown'
u'city=Dravosburg'
u'city=Durmersheim'
u'city=Edinburgh'
u'city=Eggenstein-Leopoldshafen'
u'city=El Mirage'
u'city=Enterprise'
u'city=Ettlingen'
u'city=Fabreville'
u'city=Fitchburg'
u'city=Florence'
u'city=Fort McDowell'
u'city=Fort Mcdowell'
u'city=Fort Mill'
u'city=Fountain Hills'
u'city=Gila Bend'
u'city=Gilbert'
u'city=Glendale'
u'city=Glendale Az'
u'city=Gold Canyon'
u'city=Goodyear'
u'city=Green Tree'
u'city=Green Valley'
u'city=Greenfield Park'
u'city=Guadalupe'
u'city=Hagenbach'
u'city=Harrisburg'
u'city=Heidelberg'
u'city=Henderson'
u'city=Higley'
u'city=Homestead'
u'city=Huntersville'
u'city=Indian Land'
u'city=Indian Trail'
u'city=Inverkeithing'
u'city=Jockgrim'
u'city=Juniper Green'
u'city=Karlsbad'
u'city=Karlsruhe'
u'city=Kirkland'
u'city=Kitchener'
u"city=L'\xcele-Bizard"
u"city=L'\xcele-des-Soeurs"
u'city=La Prairie'
u'city=LaSalle'
u'city=Lachine'
u'city=Lake Wylie'
u'city=Las Vegas'
u'city=Las Vegas '
u'city=Lasalle'
u'city=Lasswade'
u'city=Laval'
u'city=Laveen'
u'city=Lawrenceville'
u'city=Litchfield Park'
u'city=Loanhead'
u'city=Longueuil'
u'city=Lower Lawrenceville'
u'city=Madison'
u'city=Maricopa'
u'city=Mascouche'
u'city=Mattews'
u'city=Matthews'
u'city=Mc Farland'
u'city=Mc Kees Rocks'
u'city=McFarland'
u'city=McKees Rocks'
u'city=Mcfarland'
u'city=Mckees Rocks'
u'city=Mesa'
u'city=Middleton'
u'city=Millvale'
u'city=Mint Hill'
u'city=Monona'
u'city=Monroe'
u'city=Mont-Royal'
u'city=Montreal'
u'city=Montreal-Est'
u'city=Montreal-Nord'
u'city=Montreal-West'
u'city=Montr\xe9al'
u'city=Montr\xe9al-Nord'
u'city=Montr\xe9al-Ouest'
u'city=Mont\xe9al'
u'city=Morristown'
u'city=Mount Holly'
u'city=Mount Lebanon'
u'city=Mount Washington'
u'city=Mt. Oliver Boro'
u'city=Munhall'
u'city=Musselburgh'
u'city=N Las Vegas'
u'city=N. Las Vegas'
u'city=NELLIS AFB'
u'city=Nellis AFB'
u'city=Nellis Afb'
u'city=New Dundee'
u'city=New River'
u'city=New Town'
u'city=Newbridge'
u'city=North Las Vegas'
u'city=North Scottsdale'
u'city=Oakland'
u'city=Old Town'
u'city=Outremont'
u'city=PHOENIX'
u'city=Paradise'
u'city=Paradise Valley'
u'city=Peoria'
u'city=Pfinztal'
u'city=Pheonix'
u'city=Phoenix'
u'city=Phoenix Sky Harbor Center'
u'city=Pierrefonds'
u'city=Pineville'
u'city=Pittsburgh'
u'city=Pittsburgh/S. Hills Galleria'
u'city=Pittsburgh/Waterfront'
u'city=Pittsburrgh'
u'city=Pointe-Aux-Trembles'
u'city=Pointe-Claire'
u'city=Quebec'
u'city=Queen Creek'
u'city=Queensferry'
u'city=Ratho'
u'city=Regent Square'
u'city=Rheinstetten'
u'city=Rio Verde'
u'city=Rock Hill'
u'city=Rosemere'
u'city=Rosem\xe8re'
u'city=Saint Jacobs'
u'city=Saint Laurent'
u'city=Saint-Eustache'
u'city=Saint-Hubert'
u'city=Saint-Lambert'
u'city=Saint-Laurent'
u'city=Saint-Leonard'
u'city=Sainte-Ann-De-Bellevue'
u'city=Sainte-Anne-De-Bellevue'
u'city=Sainte-Anne-de-Bellevue'
u'city=Sainte-Genevieve'
u'city=Sainte-Therese'
u'city=Sainte-Th\xe9r\xe8se'
u'city=San Tan Valley'
u'city=Savoy'
u'city=Scottsdale'
u'city=Scottsdale Country Acres'
u'city=Sedona'
u'city=Shadyside'
u'city=Sharpsburg'
u'city=South Gyle'
u'city=South Queensferry'
u'city=Spring Valley'
u'city=St Clements'
u'city=St Jacobs'
u'city=St-Laurent'
u'city=Stallings'
u'city=Ste-Rose'
u'city=Stockbridge'
u'city=Stoughton'
u'city=Stowe Township'
u'city=Stutensee'
u'city=Stutensee neuthard'
u'city=Summerlin'
u'city=Summerlin South'
u'city=Sun City'
u'city=Sun City West'
u'city=Sun Lakes'
u'city=Sun Prairie'
u'city=Surprise'
u'city=Swissvale'
u'city=Tega Cay'
u'city=Tempe'
u'city=Terrebonne'
u'city=Tolleson'
u'city=Tonopah'
u'city=Tortilla Flat'
u'city=Urbana'
u'city=Verdun'
u'city=Verona'
u'city=Vimont'
u'city=Waldbronn'
u'city=Waterloo'
u'city=Waunakee'
u'city=Weddington'
u'city=Weingarten'
u'city=Wesley Chapel'
u'city=West Homestead'
u'city=West Mifflin'
u'city=Westmount'
u'city=Whitehall'
u'city=Wickenburg'
u'city=Wilkinsburg'
u'city=Windsor'
u'city=Woolwich'
u'city=W\xf6rth am Rhein'
u'city=Youngtown'
u'comp_cool'
u'comp_cute'
u'comp_funny'
u'comp_hot'
u'comp_list'
u'comp_more'
u'comp_note'
u'comp_photos'
u'comp_plain'
u'comp_profile'
u'comp_writer'
u'elite'
u'fans'
u'friends'
u'goodfor_breakfast'
u'goodfor_brunch'
u'goodfor_dessert'
u'goodfor_dinner'
u'goodfor_latenight'
u'goodfor_lunch'
u'latitude'
u'longitude'
u'music_background_music'
u'music_dj'
u'music_jukebox'
u'music_karaoke'
u'music_live'
u'music_playlist'
u'music_video'
u'open'
u'parking_garage'
u'parking_lot'
u'parking_street'
u'parking_valet'
u'parking_validated'
u'pt_amex'
u'pt_cash_only'
u'pt_discover'
u'pt_mastercard'
u'pt_visa'
u'res_dairy-free'
u'res_gluten-free'
u'res_halal'
u'res_kosher'
u'res_soy-free'
u'res_vegan'
u'res_vegetarian'
u'review_count'
u'stars'
u'state=AZ'
u'state=BW'
u'state=EDH'
u'state=ELN'
u'state=FIF'
u'state=IL'
u'state=KHL'
u'state=MLN'
u'state=NC'
u'state=NV'
u'state=NW'
u'state=ON'
u'state=PA'
u'state=QC'
u'state=RP'
u'state=SC'
u'state=WI'
u'state=XGL'
u'votes_cool'
u'votes_funny'
u'votes_useful'
u'yelping_since'
cshjin commented 8 years ago

21890 of 61184 business id are restaurants

attributes <type 'dict'>
    Accepts Credit Cards <type 'bool'>
    Accepts Insurance <type 'bool'>
    Ages Allowed <type 'unicode'>
    Alcohol <type 'unicode'>
    Ambience <type 'dict'>
        casual <type 'bool'>
        classy <type 'bool'>
        divey <type 'bool'>
        hipster <type 'bool'>
        intimate <type 'bool'>
        romantic <type 'bool'>
        touristy <type 'bool'>
        trendy <type 'bool'>
        upscale <type 'bool'>
    Attire <type 'unicode'>
    By Appointment Only <type 'bool'>
    BYOB <type 'bool'>
    BYOB/Corkage <type 'unicode'>
    Caters <type 'bool'>
    Coat Check <type 'bool'>
    Corkage <type 'bool'>
    Delivery <type 'bool'>
    Dietary Restrictions <type 'dict'>
        dairy-free <type 'bool'>
        gluten-free <type 'bool'>
        halal <type 'bool'>
        kosher <type 'bool'>
        soy-free <type 'bool'>
        vegan <type 'bool'>
        vegetarian <type 'bool'>
    Dogs Allowed <type 'bool'>
    Drive-Thru <type 'bool'>
    Good For <type 'dict'>
        breakfast <type 'bool'>
        brunch <type 'bool'>
        dessert <type 'bool'>
        dinner <type 'bool'>
        latenight <type 'bool'>
        lunch <type 'bool'>
    Good For Dancing <type 'bool'>
    Good For Groups <type 'bool'>
    Good for Kids <type 'bool'>
    Good For Kids <type 'bool'>
    Happy Hour <type 'bool'>
    Has TV <type 'bool'>
    Music <type 'dict'>
        background_music <type 'bool'>
        dj <type 'bool'>
        jukebox <type 'bool'>
        karaoke <type 'bool'>
        live <type 'bool'>
        playlist <type 'bool'>
        video <type 'bool'>
    Noise Level <type 'unicode'>
    Open 24 Hours <type 'bool'>
    Order at Counter <type 'bool'>
    Outdoor Seating <type 'bool'>
    Parking <type 'dict'>
        garage <type 'bool'>
        lot <type 'bool'>
        street <type 'bool'>
        valet <type 'bool'>
        validated <type 'bool'>
    Payment Types <type 'dict'>
        amex <type 'bool'>
        cash_only <type 'bool'>
        discover <type 'bool'>
        mastercard <type 'bool'>
        visa <type 'bool'>
    Price Range <type 'int'>
    Smoking <type 'unicode'>
    Take-out <type 'bool'>
    Takes Reservations <type 'bool'>
    Waiter Service <type 'bool'>
    Wheelchair Accessible <type 'bool'>
    Wi-Fi <type 'unicode'>
business_id <type 'unicode'>
categories <type 'list'>
city <type 'unicode'>
full_address <type 'unicode'>
hours <type 'dict'>
    Friday <type 'dict'>
    Monday <type 'dict'>
    Saturday <type 'dict'>
    Sunday <type 'dict'>
    Thursday <type 'dict'>
    Tuesday <type 'dict'>
    Wednesday <type 'dict'>
latitude <type 'float'>
longitude <type 'float'>
name <type 'unicode'>
neighborhoods <type 'list'>
open <type 'bool'>
review_count <type 'int'>
stars <type 'float'>
state <type 'unicode'>
type <type 'unicode'>
cshjin commented 8 years ago

Linearized.