Matatika / tap-google-sheets

tap-google-sheets, singer tap built with the Meltano SDK
GNU Affero General Public License v3.0
2 stars 9 forks source link

feat: Allow to set optional range #23

Closed lancondrej closed 4 months ago

ReubenFrankel commented 4 months ago

Since I don't have edit access, please can you add

        th.Property(
            "range",
            th.StringType,
            description=(
                "Optionally choose a range of data using cell start and end coordinates"
                " - see [A1 notation](https://developers.google.com/sheets/api/guides/concepts#expandable-1)"  # noqa: E501
                " for more information"
            ),
            required=False,
        ),

to tap_google_sheets/tap.py here, and

        - name: range

to meltano.yml here.

lancondrej commented 4 months ago

Since I don't have edit access, please can you add

        th.Property(
            "range",
            th.StringType,
            description=(
                "Optionally choose a range of data using cell start and end coordinates"
                " - see [A1 notation](https://developers.google.com/sheets/api/guides/concepts#expandable-1)"  # noqa: E501
                " for more information"
            ),
            required=False,
        ),

to tap_google_sheets/tap.py here, and

        - name: range

to meltano.yml here.

Sure I'm going to add it as well as all other comments. Thanks.

ReubenFrankel commented 4 months ago

@lancondrej I appreciate the effort but I'd rather not drop Python 3.7 support just yet for this feature PR in order to get property pattern validation with the newer SDK version.

I think your regex just needed tweaking slightly to require both start and end cell coordinates:

Current

^([A-Za-z]*)(\d*):([A-Za-z]*)(\d*)$

image

Proposed

^([A-Za-z]+|\d+|([A-Za-z]\d+)):([A-Za-z]+|\d+|([A-Za-z]\d+))$

image

lancondrej commented 4 months ago

I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.

ReubenFrankel commented 4 months ago

Also for the record, in https://github.com/Matatika/tap-google-sheets/pull/23#discussion_r1574806231 I was talking about manually raising a ConfigValidationError (i.e. from singer_sdk.exceptions import ConfigValidationError) - sorry, should have been clearer.

ReubenFrankel commented 4 months ago

I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.

Yeah, I understand why you did it, but I think we should just focus on the feature in this PR and not introduce breaking changes.

lancondrej commented 4 months ago

I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.

Yeah, I understand why you did it, but I think we should just focus on the feature in this PR and not introduce breaking changes.

I understand, I just suppose that stopping supporting python3.7 is not such a big deal considering it's almost a year after the end of life. But if you persist on supporting 3.7 I will revert those libs updates and just keep the validation directly here https://github.com/Matatika/tap-google-sheets/pull/23/files#diff-d43a9cad9a8c2d3bd8e4ddc0b914dcd00de39830c6bccbc6bcc10f25a63d509fR173

lancondrej commented 4 months ago

Proposed

^([A-Za-z]+|\d+|([A-Za-z]\d+)):([A-Za-z]+|\d+|([A-Za-z]\d+))$

image

I was also trying this one, but unfortunatelly it's not fully true cause e.g. A:5 match but it's not valid, and e.g. AA1:AF20 is valid but not match.

ReubenFrankel commented 4 months ago

I understand, I just suppose that stopping supporting python3.7 is not such a big deal considering it's almost a year after the end of life.

Dropping support for 3.7 is definitely on the to-do list - I just don't think this PR is the place to do it. I need to have a discussion with the team about how best to do that, especially here because this is the default Google Sheets variant for Meltano that I am aware quite a few people use, and I don't want to break their pipelines without a migration plan in place first. :sweat_smile:

But if you persist on supporting 3.7 I will revert those libs updates and just keep the validation directly here https://github.com/Matatika/tap-google-sheets/pull/23/files#diff-d43a9cad9a8c2d3bd8e4ddc0b914dcd00de39830c6bccbc6bcc10f25a63d509fR173

This sounds good. :+1:

ReubenFrankel commented 4 months ago

AA1:AF20 is valid but not match

Ah, I missed some +s

^([A-Za-z]+|\d+|([A-Za-z]+\d+)):([A-Za-z]+|\d+|([A-Za-z]+\d+))$

image

A:5 match but it's not valid

Right, maybe we could perform that validation after the regex has been evaluated? To be honest, I think we will have caught most legitimate config errors with the above, so I'm not that concerned if a user hits a 400 Bad Request for the edge-case ranges like A:5 and 5:A (that should be enough of an indication to prompt them to check their config and realise their range value doesn't make sense). As long as it's not some programmatic exception like IndexError.

ReubenFrankel commented 4 months ago

It's not pretty, but I think this regex satisfies all cases FWIW:

^([A-Za-z]+(?!:\d+$)|\d+(?!:[A-Za-z]+$)|[A-Za-z]+\d+):([A-Za-z]|\d+|[A-z]+\d+)$

image

lancondrej commented 4 months ago

It's not pretty, but I think this regex satisfies all cases FWIW:

^([A-Za-z]+(?!:\d+$)|\d+(?!:[A-Za-z]+$)|[A-Za-z]+\d+):([A-Za-z]|\d+|[A-z]+\d+)$

image

Yes true, it satisfies almost all we need, I would rather limit the amount of [A-Za-z] to max 3, and number should be max 7 digits. But actually I would prefer to split it to multiple regex as I have it, cause the main purpose of it is that thanks to empty groups () in regexp it helps to parse an input to proper parts (columns and rows).

lancondrej commented 4 months ago

I reverted the update and kept just validation in get_first_line_range

ReubenFrankel commented 4 months ago

LGTM! Thanks @lancondrej