anelendata / tap-exchangeratehost

singer.io tap to extract currency exchange rate
Apache License 2.0
4 stars 8 forks source link

Allow running tap multiple times the same day #2

Closed jaceksan closed 1 year ago

jaceksan commented 1 year ago

start_day record can be missing in the resultset, if everything has already been extracted. Prevent errors like KeyError: '2023-03-28'

daigotanaka commented 1 year ago

Hi @jaceksan

Thanks for improving tap-exchangeratehost. I just pulled your code from your forked repo to run

tap-exchangeratehost -c config.sample.json 

But I got:

Traceback (most recent call last):
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/bin/tap-exchangeratehost", line 11, in <module>
    load_entry_point('tap-exchangeratehost', 'console_scripts', 'tap-exchangeratehost')()
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/tap_exchangeratehost/__init__.py", line 151, in main
    while datetime.datetime.strptime(next_date, DATE_FORMAT) < datetime.datetime.utcnow():
TypeError: strptime() argument 1 must be str, not None

Can you check this and fix? I tried the original code but it ran without the error.

Thanks!

jaceksan commented 1 year ago

Hi @jaceksan

Thanks for improving tap-exchangeratehost. I just pulled your code from your forked repo to run

tap-exchangeratehost -c config.sample.json 

But I got:

Traceback (most recent call last):
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/bin/tap-exchangeratehost", line 11, in <module>
    load_entry_point('tap-exchangeratehost', 'console_scripts', 'tap-exchangeratehost')()
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/tap_exchangeratehost/__init__.py", line 151, in main
    while datetime.datetime.strptime(next_date, DATE_FORMAT) < datetime.datetime.utcnow():
TypeError: strptime() argument 1 must be str, not None

Can you check this and fix? I tried the original code but it ran without the error.

Thanks!

Fixed. However, I tested it with start_date=2010-01-01 and it is failing when loading data to Snowflake.

snowflake.connector.errors.ProgrammingError: 100080 (22000): Number of columns in file (170) does not match that of the corresponding table (172), use file format option error_on_column_count_mismatch=false to ignore this error

The issue is that number of columns is changing over time, some currencies did not exist in the past. There are two solutions:

jaceksan commented 1 year ago

OK, the current design works if I update the definition of the Snowflake format with error_on_column_count_mismatch set to false:

CREATE FILE FORMAT cicd_dev.PUBLIC.meltano_format TYPE = 'CSV' ESCAPE='\\' FIELD_OPTIONALLY_ENCLOSED_BY='"' error_on_column_count_mismatch=false;

Feel free to merge this.

jaceksan commented 1 year ago

Eh, now I realized that maybe the current design is still not optimal. When a batch (one execution of do_sync) contains rows with different number of columns, currently the schema is populated from the first row, which may not contain all columns existing in the last row. Let me update it so the schema is generated from the last row.

jaceksan commented 1 year ago

Fixed, now the schema is generated from the last row (ordered by date).

daigotanaka commented 1 year ago

Can you consider Python 3.8 support if not 3.7? Right now, it fails in pre 3.9 like:

Traceback (most recent call last):
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/bin/tap-exchangeratehost", line 11, in <module>
    load_entry_point('tap-exchangeratehost', 'console_scripts', 'tap-exchangeratehost')()
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 480, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2693, in load_entry_point
    return ep.load()
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2324, in load
    return self.resolve()
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2330, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/ubuntu/project/tmp_tap_exch/tap-exchangeratehost/tap_exchangeratehost/__init__.py", line 44, in <module>
    def make_schema(response: dict, dates: list[str]) -> Dict:
TypeError: 'type' object is not subscriptable

Inserting this may be enough:

# Pytho ~3.8 support                                                            
from __future__ import annotations

as per https://stackoverflow.com/questions/59101121/type-hint-for-a-dict-gives-typeerror-type-object-is-not-subscriptable

jaceksan commented 1 year ago

Fixed

daigotanaka commented 1 year ago

@jaceksan Thx!