civisanalytics / civis-python

Civis API Python Client
BSD 3-Clause "New" or "Revised" License
34 stars 26 forks source link

document that `table_columns` parameter's array of hashes must be ordered #380

Open nik-bladey opened 4 years ago

nik-bladey commented 4 years ago

It would be helpful if the documentation for the table_columns parameter surfaced in API Client resources (like in post_files_csv, among others) specified that the array of hashes must be ordered. Since these are key:value pairs, I initially assumed that as with Python dictionaries the order would be irrelevant. It took me a while of puzzling over very confusing downstream errors before realizing I had to provide them in the same order as the columns of the table. Apparently arrays of hashes are always like this / this is common computer science knowledge, but I've never used hashes before and it was not intuitive.

As Manticore is working on surfacing table_columns in even more places, I suspect other people will run into this more in the future.

For reference, the current documentation looks like this: image

mheilman commented 4 years ago

Apparently arrays of hashes are always like this / this is common computer science knowledge, but I've never used hashes before and it was not intuitive.

I don't think this has anything to do with common CS knowledge. The docstring seems just unclear. I think it should be improved upstream in the platform API code rather than the API client. I'll file an internal issue, but I think we should leave this open until it gets addressed in the platform API.

mheilman commented 4 years ago

@Zephyraith, would it be possible for you to provide an example of the input you were trying to provide versus what platform was expecting?

nik-bladey commented 4 years ago

Some helpful rephrasing in the documentation occurred as part of #379 to substitute "dictionaries" for "hashes", but that didn't fully address my concerns.

@stephen-hoover, this relates to a Slack conversation we / you had today in #pmi-engineering and #data-engineering -- perhaps you have contributing thoughts.

1 2 3 4

stephen-hoover commented 4 years ago

For me, the main point of confusion was in not using table_columns. I'd assumed that when appending to an existing table using dataframe_to_civis, the DataFrame column information would be used to correctly order the columns for appending to the database table, but the DataFrame column names are ignored. A couple of @Zephyraith 's examples here are about that (for me) confusing behavior when not providing table_columns.

One non-obvious thing is that (as I understand it), table_columns is about the columns in the data that are being uploaded, and not so much about the table in the database.