Closed lancondrej closed 4 months ago
Since I don't have edit access, please can you add
th.Property( "range", th.StringType, description=( "Optionally choose a range of data using cell start and end coordinates" " - see [A1 notation](https://developers.google.com/sheets/api/guides/concepts#expandable-1)" # noqa: E501 " for more information" ), required=False, ),
to
tap_google_sheets/tap.py
here, and- name: range
to
meltano.yml
here.
Sure I'm going to add it as well as all other comments. Thanks.
@lancondrej I appreciate the effort but I'd rather not drop Python 3.7 support just yet for this feature PR in order to get property pattern validation with the newer SDK version.
I think your regex just needed tweaking slightly to require both start and end cell coordinates:
^([A-Za-z]*)(\d*):([A-Za-z]*)(\d*)$
^([A-Za-z]+|\d+|([A-Za-z]\d+)):([A-Za-z]+|\d+|([A-Za-z]\d+))$
I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.
Also for the record, in https://github.com/Matatika/tap-google-sheets/pull/23#discussion_r1574806231 I was talking about manually raising a ConfigValidationError
(i.e. from singer_sdk.exceptions import ConfigValidationError
) - sorry, should have been clearer.
I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.
Yeah, I understand why you did it, but I think we should just focus on the feature in this PR and not introduce breaking changes.
I remake the range to support all valid A1 notation inputs. Use pattern validation for StringType -> this required singer-sdk update -> so stop supporting python 3.7, I also need to update other libs cause without it test starts failing for py 3.10 and more. So I do this update and also setup CI to test all supported python versions.
Yeah, I understand why you did it, but I think we should just focus on the feature in this PR and not introduce breaking changes.
I understand, I just suppose that stopping supporting python3.7 is not such a big deal considering it's almost a year after the end of life. But if you persist on supporting 3.7 I will revert those libs updates and just keep the validation directly here https://github.com/Matatika/tap-google-sheets/pull/23/files#diff-d43a9cad9a8c2d3bd8e4ddc0b914dcd00de39830c6bccbc6bcc10f25a63d509fR173
Proposed
^([A-Za-z]+|\d+|([A-Za-z]\d+)):([A-Za-z]+|\d+|([A-Za-z]\d+))$
I was also trying this one, but unfortunatelly it's not fully true cause e.g. A:5 match but it's not valid, and e.g. AA1:AF20 is valid but not match.
I understand, I just suppose that stopping supporting python3.7 is not such a big deal considering it's almost a year after the end of life.
Dropping support for 3.7 is definitely on the to-do list - I just don't think this PR is the place to do it. I need to have a discussion with the team about how best to do that, especially here because this is the default Google Sheets variant for Meltano that I am aware quite a few people use, and I don't want to break their pipelines without a migration plan in place first. :sweat_smile:
But if you persist on supporting 3.7 I will revert those libs updates and just keep the validation directly here https://github.com/Matatika/tap-google-sheets/pull/23/files#diff-d43a9cad9a8c2d3bd8e4ddc0b914dcd00de39830c6bccbc6bcc10f25a63d509fR173
This sounds good. :+1:
AA1:AF20 is valid but not match
Ah, I missed some +
s
^([A-Za-z]+|\d+|([A-Za-z]+\d+)):([A-Za-z]+|\d+|([A-Za-z]+\d+))$
A:5 match but it's not valid
Right, maybe we could perform that validation after the regex has been evaluated? To be honest, I think we will have caught most legitimate config errors with the above, so I'm not that concerned if a user hits a 400 Bad Request
for the edge-case ranges like A:5
and 5:A
(that should be enough of an indication to prompt them to check their config and realise their range
value doesn't make sense). As long as it's not some programmatic exception like IndexError
.
It's not pretty, but I think this regex satisfies all cases FWIW:
^([A-Za-z]+(?!:\d+$)|\d+(?!:[A-Za-z]+$)|[A-Za-z]+\d+):([A-Za-z]|\d+|[A-z]+\d+)$
It's not pretty, but I think this regex satisfies all cases FWIW:
^([A-Za-z]+(?!:\d+$)|\d+(?!:[A-Za-z]+$)|[A-Za-z]+\d+):([A-Za-z]|\d+|[A-z]+\d+)$
Yes true, it satisfies almost all we need, I would rather limit the amount of [A-Za-z] to max 3, and number should be max 7 digits. But actually I would prefer to split it to multiple regex as I have it, cause the main purpose of it is that thanks to empty groups () in regexp it helps to parse an input to proper parts (columns and rows).
I reverted the update and kept just validation in get_first_line_range
LGTM! Thanks @lancondrej
Since I don't have edit access, please can you add
to
tap_google_sheets/tap.py
here, andto
meltano.yml
here.