CartoDB / cartoframes

CARTO Python package for data scientists
BSD 3-Clause "New" or "Revised" License
249 stars 63 forks source link

column names are too long? #1178

Closed djfan closed 4 years ago

djfan commented 4 years ago

I tried to write this Dataset to my carto account, and the upload function seems to work well (no error message) in python, but it failed to download the data. Maybe if add up all column names, the length is too long? It happens to me several times, especially after enriching data from multiple sources (having a large number of columns). Any solution to fix this?

Screen Shot 2019-11-07 at 2 37 38 PM
Error Message

``` --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) ~/anaconda3/lib/python3.6/site-packages/carto/sql.py in copyto(self, query) 450 stream=True) --> 451 response.raise_for_status() 452 except CartoRateLimitException as e: ~/anaconda3/lib/python3.6/site-packages/requests/models.py in raise_for_status(self) 939 if http_error_msg: --> 940 raise HTTPError(http_error_msg, response=self) 941 HTTPError: 400 Client Error: Bad Request for url: https://dfan.carto.com/api/v2/sql/copyto?api_key=fe760da2e6adfadffdadfsafda9757107aefe0dad1&q=COPY+%28SELECT+cartodb_id%2C+geoid%2C+iso10min%2C+popcy%2C+popcygrp%2C+popcygrpi%2C+agecy0004%2C+agecy0509%2C+agecy1014%2C+agecy1519%2C+agecy2024%2C+agecy2529%2C+agecy3034%2C+agecy3539%2C+agecy4044%2C+agecy4549%2C+agecy5054%2C+agecy5559%2C+agecy6064%2C+agecy6569%2C+agecy7074%2C+agecy7579%2C+agecy8084%2C+agecygt85%2C+agecymed%2C+sexcymal%2C+sexcyfem%2C+rchcywhnhs%2C+rchcyblnhs%2C+rchcyamnhs%2C+rchcyasnhs%2C+rchcyhanhs%2C+rchcyotnhs%2C+rchcymunhs%2C+hiscyhisp%2C+marcynever%2C+marcymarr%2C+marcysep%2C+marcywidow%2C+marcydivor%2C+agecygt15%2C+educyltgr9%2C+educyshsch%2C+educyhsch%2C+educyscoll%2C+educyassoc%2C+educybach%2C+educygrad%2C+agecygt25%2C+hhdcy%2C+hhdcyfam%2C+hhscymcfch%2C+hhscylpmch%2C+hhscylpfch%2C+hhdcyavesz%2C+hhdcymedag%2C+vphcynone%2C+vphcy1%2C+vphcygt1%2C+inccypcap%2C+inccyavehh%2C+inccymedhh%2C+inccymedfa%2C+hincylt10%2C+hincy1015%2C+hincy1520%2C+hincy2025%2C+hincy2530%2C+hincy3035%2C+hincy3540%2C+hincy4045%2C+hincy4550%2C+hincy5060%2C+hincy6075%2C+hincy75100%2C+hincy10025%2C+hincy12550%2C+hincy15020%2C+hincygt200%2C+hincymed24%2C+hincymed25%2C+hincymed35%2C+hincymed45%2C+hincymed55%2C+hincymed65%2C+hincymed75%2C+lbfcypop16%2C+lbfcyarm%2C+lbfcyempl%2C+lbfcyunem%2C+lbfcynlf%2C+unecyrate%2C+lbfcylbf%2C+lniexspan%2C+lniexisol%2C+hooexmed%2C+rntexmed%2C+husex1det%2C+husexapt%2C+dwlcy%2C+dwlcyvacnt%2C+dwlcyrent%2C+dwlcyowned%2C+poppy%2C+hhdpy%2C+dwlpy%2C+agepymed%2C+incpypcap%2C+incpyavehh%2C+incpymedhh%2C+xcyap1%2C+xcyap2%2C+xcyap3%2C+xcyap4%2C+xcyap5%2C+xcyap6%2C+xcyap7%2C+xcycc%2C+xcyed1%2C+xcyed2%2C+xcyen1%2C+xcyen2%2C+xcyen3%2C+xcyfb1%2C+xcyfb2%2C+xcyfb3%2C+xcygi%2C+xcyhc1%2C+xcyhc2%2C+xcyhc3%2C+xcyhf3%2C+xcyhf1%2C+xcyhf2%2C+xcyhf4%2C+xcyhf5%2C+xcyhh1%2C+xcyhh2%2C+xcyhh3%2C+xcyhh4%2C+xcyhh5%2C+xcyho1%2C+xcyho2%2C+xcyho3%2C+xcyho4%2C+xcymi1%2C+xcymi2%2C+xcymi3%2C+xcymi4%2C+xcypc2%2C+xcypc1%2C+xcypc3%2C+xcypc4%2C+xcypi%2C+xcyrd1%2C+xcyrd2%2C+xcyrd3%2C+xcytb1%2C+xcytb2%2C+xcytr3%2C+xcytr7%2C+xcytr1%2C+xcytr2%2C+xcytr4%2C+xcytr5%2C+xcytr6%2C+xcytr8%2C+xcytr9%2C+xcyut1%2C+xcyut2%2C+xcyut3%2C+xcyut4%2C+xcyut5%2C+xpyap1%2C+xpyap2%2C+xpyap3%2C+xpyap4%2C+xpyap5%2C+xpyap6%2C+xpyap7%2C+xpycc%2C+xpyed1%2C+xpyed2%2C+xpyen1%2C+xpyen2%2C+xpyen3%2C+xpyfb1%2C+xpyfb2%2C+xpyfb3%2C+xpygi%2C+xpyhc1%2C+xpyhc2%2C+xpyhc3%2C+xpyhf3%2C+xpyhf1%2C+xpyhf2%2C+xpyhf4%2C+xpyhf5%2C+xpyhh1%2C+xpyhh2%2C+xpyhh3%2C+xpyhh4%2C+xpyhh5%2C+xpyho1%2C+xpyho2%2C+xpyho3%2C+xpyho4%2C+xpymi1%2C+xpymi2%2C+xpymi3%2C+xpymi4%2C+xpypc2%2C+xpypc1%2C+xpypc3%2C+xpypc4%2C+xpypi%2C+xpyrd1%2C+xpyrd2%2C+xpyrd3%2C+xpytb1%2C+xpytb2%2C+xpytr3%2C+xpytr7%2C+xpytr1%2C+xpytr2%2C+xpytr4%2C+xpytr5%2C+xpytr6%2C+xpytr8%2C+xpytr9%2C+xpyut1%2C+xpyut2%2C+xpyut3%2C+xpyut4%2C+xpyut5%2C+pmacyhh1pa%2C+pmacyhd01%2C+pmacyhd02%2C+pmacyhd03%2C+pmacyhd04%2C+pmacyhd05%2C+pmacyhd06%2C+pmacyhd07%2C+pmacyhd08%2C+pmacyhd09%2C+pmacyhd10%2C+pmacyhd11%2C+pmacyhd12%2C+pmacyhd13%2C+pmacyhd14%2C+pmacyhd15%2C+pmacyhd16%2C+pmacyhd17%2C+pmacyhd18%2C+pmacyhd19%2C+pmacyhd20%2C+pmacyhd21%2C+pmacyhd22%2C+pmacyhd23%2C+pmacyhd24%2C+pmacyhd25%2C+pmacyhd26%2C+pmacyhd27%2C+pmacyhd28%2C+pmacyhd29%2C+pmacyhd30%2C+pmacyhd31%2C+pmacyhd32%2C+pmacyhd33%2C+pmacyhd34%2C+pmacyhd35%2C+pmacyhd36%2C+pmacyhd37%2C+pmacyhd38%2C+pmacyhd39%2C+pmacyhd40%2C+pmacyhd41%2C+pmacyhd42%2C+pmacyhd43%2C+pmacyhd44%2C+pmacyhd45%2C+pmacyhd46%2C+pmacyhd47%2C+pmacyhd48%2C+pmacyhd49%2C+pmacyhd50%2C+pmacyhd51%2C+pmacyhd52%2C+pmacyhd53%2C+pmacyhd54%2C+pmacyhd55%2C+pmacyhd56%2C+pmacyhd57%2C+pmacyhd58%2C+pmacyhd59%2C+pmacyhd60%2C+pmacyhd61%2C+pmacyhd62%2C+pmacyhd63%2C+pmacyhd64%2C+pmacyhd65%2C+pmacyhd66%2C+pmacyhd67%2C+pmacyhd68%2C+pmacypp1pa%2C+pmacypp01%2C+pmacypp02%2C+pmacypp03%2C+pmacypp04%2C+pmacypp05%2C+pmacypp06%2C+pmacypp07%2C+pmacypp08%2C+pmacypp09%2C+pmacypp10%2C+pmacypp11%2C+pmacypp12%2C+pmacypp13%2C+pmacypp14%2C+pmacypp15%2C+pmacypp16%2C+pmacypp17%2C+pmacypp18%2C+pmacypp19%2C+pmacypp20%2C+pmacypp21%2C+pmacypp22%2C+pmacypp23%2C+pmacypp24%2C+pmacypp25%2C+pmacypp26%2C+pmacypp27%2C+pmacypp28%2C+pmacypp29%2C+pmacypp30%2C+pmacypp31%2C+pmacypp32%2C+pmacypp33%2C+pmacypp34%2C+pmacypp35%2C+pmacypp36%2C+pmacypp37%2C+pmacypp38%2C+pmacypp39%2C+pmacypp40%2C+pmacypp41%2C+pmacypp42%2C+pmacypp43%2C+pmacypp44%2C+pmacypp45%2C+pmacypp46%2C+pmacypp47%2C+pmacypp48%2C+pmacypp49%2C+pmacypp50%2C+pmacypp51%2C+pmacypp52%2C+pmacypp53%2C+pmacypp54%2C+pmacypp55%2C+pmacypp56%2C+pmacypp57%2C+pmacypp58%2C+pmacypp59%2C+pmacypp60%2C+pmacypp61%2C+pmacypp62%2C+pmacypp63%2C+pmacypp64%2C+pmacypp65%2C+pmacypp66%2C+pmacypp67%2C+pmacypp68%2C+administration_of_economic_programs%2C+administration_of_environmental_quality_and_housing_programs%2C+administration_of_human_resource_programs%2C+agricultural_production_crops%2C+agricultural_production_livestock_and_animal_specialties%2C+agricultural_services%2C+amusement_and_recreation_services%2C+apparel_and_accessory_stores%2C+apparel_finished_products_from_fabrics_and_similar_materials%2C+automotive_dealers_and_gasoline_service_stations%2C+automotive_repair_services_and_parking%2C+building_materials_hardware_garden_supplies_and_mobile_homes%2C+business_services%2C+chemicals_and_allied_products%2C+communications%2C+construction_general_contractors_and_operative_builders%2C+construction_special_trade_contractors%2C+depository_institutions%2C+eating_and_drinking_places%2C+educational_services%2C+electric_gas_and_sanitary_services%2C+electronic_and_other_electrical_equipment_and_components%2C+engineering_accounting_research_and_management_services%2C+executive_legislative_and_general_government_except_finance%2C+fabricated_metal_products%2C+fishing_hunting_and_trapping%2C+food_and_kindred_products%2C+food_stores%2C+forestry%2C+furniture_and_fixtures%2C+general_merchandise_stores%2C+government_and_public_services%2C+health_services%2C+heavy_construction_except_building_construction_contractor%2C+holding_and_other_investment_offices%2C+home_furniture_furnishings_and_equipment_stores%2C+hotels_rooming_houses_camps_and_other_lodging_places%2C+industrial_and_commercial_machinery_and_computer_equipment%2C+insurance_agents_brokers_and_service%2C+insurance_carriers%2C+justice_public_order_and_safety%2C+label_features%2C+leather_and_leather_products%2C+legal_services%2C+leisure%2C+local_and_suburban_transit_and_interurban_highway_transportatio%2C+lumber_and_wood_products_except_furniture%2C+measuring_photographic_medical_and_optical_goods_and_clocks%2C+membership_organizations%2C+metal_mining%2C+mining_and_quarrying_of_nonmetallic_minerals_except_fuels%2C+miscellaneous_manufacturing_industries%2C+miscellaneous_repair_services%2C+miscellaneous_retail%2C+motion_pictures%2C+motor_freight_transportation%2C+museums_art_galleries_and_botanical_and_zoological_gardens%2C+national_security_and_international_affairs%2C+nonclassifiable_establishments%2C+nondepository_credit_institutions%2C+oil_and_gas_extraction%2C+paper_and_allied_products%2C+personal_services%2C+petroleum_refining_and_related_industries%2C+pipelines_except_natural_gas%2C+primary_metal_industries%2C+printing_publishing_and_allied_industries%2C+public_finance_taxation_and_monetary_policy%2C+railroad_transportation%2C+real_estate%2C+rubber_and_miscellaneous_plastic_products%2C+security_and_commodity_brokers_dealers_exchanges_and_services%2C+services_nec%2C+shopping%2C+social_services%2C+sports%2C+stone_clay_glass_and_concrete_products%2C+textile_mill_products%2C+tobacco_products%2C+tourism%2C+transportation_by_air%2C+transportation_equipment%2C+transportation_services%2C+united_states_postal_service%2C+water_transportation%2C+wholesale_trade_durable_goods%2C+wholesale_trade_nondurable_goods%2C+total%2C+the_geom+FROM+%22dfan%22.%22column_names_are_too_long_arent_they%22%29+TO+stdout+WITH+%28FORMAT+csv%2C+HEADER+true%29&client=cartoframes_1.0b4 During handling of the above exception, another exception occurred: JSONDecodeError Traceback (most recent call last) in ----> 1 Dataset('column_names_are_too_long_arent_they', credentials=cc).download(decode_geom=True) ~/anaconda3/lib/python3.6/site-packages/cartoframes/data/dataset/dataset.py in download(self, limit, decode_geom, retry_times) 226 """ 227 --> 228 return self._strategy.download(limit, decode_geom, retry_times) 229 230 IF_EXISTS_FAIL = BaseDataset.IF_EXISTS_FAIL ~/anaconda3/lib/python3.6/site-packages/cartoframes/data/dataset/registry/table_dataset.py in download(self, limit, decode_geom, retry_times) 32 columns = self._get_table_columns() 33 query = self._get_read_query(columns, limit) ---> 34 self._df = self._copyto(columns, query, limit, decode_geom, retry_times) 35 return self._df 36 ~/anaconda3/lib/python3.6/site-packages/cartoframes/data/dataset/registry/base_dataset.py in _copyto(self, columns, query, limit, decode_geom, retry_times) 169 def _copyto(self, columns, query, limit, decode_geom, retry_times): 170 copy_query = """COPY ({}) TO stdout WITH (FORMAT csv, HEADER true)""".format(query) --> 171 raw_result = self._context.download(copy_query, retry_times) 172 173 df_types = dtypes(columns, exclude_dates=True, exclude_the_geom=True, exclude_bools=True) ~/anaconda3/lib/python3.6/site-packages/cartoframes/lib/context/api_context.py in download(self, query, retry_times) 29 def download(self, query, retry_times=DEFAULT_RETRY_TIMES): 30 try: ---> 31 return self.copy_client.copyto_stream(query.strip()) 32 except CartoRateLimitException as err: 33 if retry_times > 0: ~/anaconda3/lib/python3.6/site-packages/carto/sql.py in copyto_stream(self, query) 517 :raise: CartoException 518 """ --> 519 return ResponseStream(self.copyto(query)) ~/anaconda3/lib/python3.6/site-packages/carto/sql.py in copyto(self, query) 455 if 400 <= response.status_code < 500: 456 # Client error, provide better reason --> 457 reason = response.json()['error'][0] 458 error_msg = u'%s Client Error: %s' % (response.status_code, 459 reason) ~/anaconda3/lib/python3.6/site-packages/requests/models.py in json(self, **kwargs) 895 # used. 896 pass --> 897 return complexjson.loads(self.text, **kwargs) 898 899 @property ~/anaconda3/lib/python3.6/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw) 516 parse_constant is None and object_pairs_hook is None 517 and not use_decimal and not kw): --> 518 return _default_decoder.decode(s) 519 if cls is None: 520 cls = JSONDecoder ~/anaconda3/lib/python3.6/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3) 368 if _PY3 and isinstance(s, binary_type): 369 s = s.decode(self.encoding) --> 370 obj, end = self.raw_decode(s) 371 end = _w(s, end).end() 372 if end != len(s): ~/anaconda3/lib/python3.6/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3) 398 elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf': 399 idx += 3 --> 400 return self.scan_once(s, idx=_w(s, idx).end()) JSONDecodeError: Expecting value: line 1 column 1 (char 0) ```

cc @andy-esch

andy-esch commented 4 years ago

My guess is that the GET params are too long for the request, and for some reason the Python SDK copyto is not choosing a POST. Or maybe it is using the POST and the q param is not moved to the body? The length of characters in the request is ~7772.

djfan commented 4 years ago

Hi @Jesus89 I tried it again. This time I met errors like these.

Screen Shot 2019-12-02 at 1 17 42 PM Screen Shot 2019-12-02 at 1 20 43 PM Screen Shot 2019-12-02 at 1 24 09 PM

it seems like the column names are not normalized to what SQL allows?

cc @andy-esch

Jesus89 commented 4 years ago

Hi @djfan. There is a bug in the normalization of the columns. Could you perform the tests installing CF from this branch normalize-column-names? Thanks.

djfan commented 4 years ago

I did this pip install -e git://github.com/CartoDB/cartoframes.git@normalize-column-names#egg=pip

upload works well. But download returns the same error.

(I think normalize-column-names is based on b4 not b6. maybe that's the reason? )

Screen Shot 2019-12-03 at 11 32 10 AM Screen Shot 2019-12-03 at 11 32 19 AM
Jesus89 commented 4 years ago

Thanks @djfan!. We have solved the normalization issue then, but this second test means that the fixes in the COPY command do not cover the 400 Client Error: Bad Request for url case.

Could you send me the p file?

djfan commented 4 years ago

sure. the link is in the first post of this issue thread.

Jesus89 commented 4 years ago

O_o

Awesome :sweat_smile:

Jesus89 commented 4 years ago

I have checked that you are using Dataset which is deprecated in beta6 (and also in that branch) so maybe you have not installed the branch.

I have tested upload/download with your file and it works for me:

Could you check this again? Thanks!