burnash / gspread

Google Sheets Python API
https://docs.gspread.org
MIT License
7.11k stars 948 forks source link

Disambiguating and filling in headers similar to Pandas #1436

Open imrehg opened 8 months ago

imrehg commented 8 months ago

Is your feature request related to a problem? Please describe.

In the real world it is quite common to have spreadsheets which have non-unique column headers (often the case when the sheet is intended to be used by people rather than code). There are also cases when column headers might be missing, even if there are available data in the columns.

Currently these wouldn't work when running get_all_records (e.g before passing data into a Pandas DataFrame), as the tool expects unique headers in general, and expects unique headers even when the expectations are passed in.

Describe the solution you'd like

The Pandas' read_excel tool handles these cases, and does a reasonable job with it.

Something similar for the two types of disambiguation could be useful.

Describe alternatives you've considered

I've considered using manually rolled worksheet.col_values(...) to get the individual columns and manipulate them as needed, rather than using get_all_records, though this would be a lot of manual shuffling.

Additional context

Happy to potentially contribute.

alifeee commented 8 months ago

Hi, thanks for the suggestion :)

We had a lot of feature creep with get_all_records near the end of v5, resulting in a simplification of the method. See #1367 and surrounding issues/PRs for more information.

Manually

To do this manually, you could get the titles and data with code similar to (where rename_unique() is a custom method that changes "data", "data", to "data", "data-1", for example):

header = spreadsheet.get("1:1")[0]
header_unique = rename_unique(header)
cells = spreadsheet.get("6:9")
some_records = utils.to_records(header, cells)

...as described in the v5 to v6 migration guide. See documentation for utils.to_records here.

As part of gspread

As I mentioned, we tried to limit the scope of get_all_records as it was becoming too complicated.

Your suggestion I would see as perhaps a new utils function (like rename_unique() above) in gspread/utils.py, a new kwarg in get_all_records here

https://github.com/burnash/gspread/blob/8d55a6bd99c3b8f871167290c42c4167f8f3209c/gspread/worksheet.py#L500-L502

        allow_underscores_in_numeric_literals=False,
        empty2zero=False,
+       rename_identical_headers=False,
    ) -> List[Dict[str, Union[int, float, str]]]:

and then the implementation somewhere like here

https://github.com/burnash/gspread/blob/8d55a6bd99c3b8f871167290c42c4167f8f3209c/gspread/worksheet.py#L567-L568

        keys = entire_sheet[head - 1]
        values = entire_sheet[head:]
+
+       if rename_identical_headers is True:
+         keys = utils.rename_unique(keys)

Opinion

Personally, I think this change is "simple enough" that I would be happy for it to be added.

I will ask @lavigne958's opinion too, on the added complexity.

Implementation

I will not implement this. As you say

Happy to potentially contribute.

I am happy to help you contribute if you desire this feature :)

Let us wait for @lavigne958's opinion and then you can read the contributing guide and we can help if you are interested in implementing this feature

good day :)

lavigne958 commented 8 months ago

Hi @imrehg thank you for this insight.

I really like the idea. It could solve our problem and offer the users some flexibility.

When I think about it, the major problem is:

Then from your proposal and the background of issues we faced, we don't actually need the user to change the data nor skip some columns of data too. We need to actually create a unique name if it's not the case.

What I propose as a flexible solution to this is:

In the end a user can choose to have .1 etc or -1 or anything as suffix for duplicate and any name for empty header.

As mentioned by @alifeee if you need any help contributing we are happy to help 🙃

I agree as well it will be useful to put it all in a single function in utile so it could benefit everyone.

lavigne958 commented 7 months ago

Hi @imrehg let us know if you want us to make this feature or if you'd rather contribute, if so we can guide you too.

imrehg commented 7 months ago

Hey @lavigne958 @alifeee, sorry that I've missed the original repies. Thanks for the detailed response, all this context makes sense (and I do appreciate scope creep and all that!)

Yes, I'm still happy to contribute. Your messages have quite a bit of details to start off on, and I'll go through that first, and look at making the changes in those spirit. If you think there's anything useful that I should know, I'm happy to hear!

lavigne958 commented 7 months ago

The most useful link for you will be: contributing guide 😁

lavigne958 commented 5 months ago

Hey @lavigne958 @alifeee, sorry that I've missed the original repies. Thanks for the detailed response, all this context makes sense (and I do appreciate scope creep and all that!)

Yes, I'm still happy to contribute. Your messages have quite a bit of details to start off on, and I'll go through that first, and look at making the changes in those spirit. If you think there's anything useful that I should know, I'm happy to hear!

Hi @imrehg , just wondering if you are still interested in making the changes ?

imrehg commented 5 months ago

Hey @lavigne958 definitely still interested, just life got in the way a bit, apologies! :cold_sweat: Gonna be looking at this soon, unless I'm bottlenecking you. In that case just let me know!

lavigne958 commented 5 months ago

No problem we have other issues to solve for now, so far so good thanks :+1: