anelendata / tap-rest-api

Singer.io tap for generic Rest API
Apache License 2.0
23 stars 14 forks source link

`No such file or directory` found when running discovery #23

Open jlloyd-widen opened 3 years ago

jlloyd-widen commented 3 years ago

Running tap within Meltano. Specifically the following command meltano invoke tap-rest-api --infer_schema or meltano select --list --all tap-rest-api. I have the following meltano.yml:

version: 1
send_anonymous_usage_stats: false
elt.buffer_size: 52428800
plugins:
  extractors:
  - name: tap-rest-api
    pip_url: tap-rest-api
    namespace: tap_rest_api
    executable: tap-rest-api
    capabilities:
      - catalog
      - config
      - state
      - discover
    settings:
      - name: streams
      - name: url
      - name: catalog_dir
      - name: schema_dir
      - name: schema
      - name: auth_method
    config:
      url: http://<whatever>.com
      auth_method: no_auth
      catalog_dir: ./extract
      schema_dir: ./extract
      streams: test_stream
      schema: test_schema

Here's full text of the error. It appears to be trying to read a file that has not been created yet.

Catalog discovery failed: command ['/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api', '--config', '/Users/.../meltano/.meltano/run/tap-rest-api/tap.config.json', '--discover'] returned 1: INFO Loading Schemas
INFO Loading schema for test_stream
CRITICAL [Errno 2] No such file or directory: './extract/test_stream.json'
Traceback (most recent call last):
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/bin/tap-rest-api", line 8, in <module>
    sys.exit(main())
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/singer/utils.py", line 229, in wrapped
    return fnc(*args, **kwargs)
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/__init__.py", line 188, in main
    discover(CONFIG, STREAMS)
  File "/Users.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 64, in discover
    config["schema"])
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 54, in _discover_schemas
    stream)})
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 39, in load_discovered_schema
    schema = load_schema(schema_dir, stream.tap_stream_id)
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/tap_rest_api/schema.py", line 33, in load_schema
    schema = utils.load_json(os.path.join(schema_dir, "{}.json".format(entity)))
  File "/Users/.../meltano/.meltano/extractors/tap-rest-api/venv/lib/python3.7/site-packages/singer/utils.py", line 108, in load_json
    with open(path) as fil:
FileNotFoundError: [Errno 2] No such file or directory: './extract/test_stream.json'

One concern is that the command meltano seems to be generating seems to be using discover instead of infer_schema. So maybe this is a bug in meltano or just demonstrating incompatibility with Meltano?

daigotanaka commented 3 years ago

It's probably incompatibility that can be overcome. I'm not a user of Meltano, but can you first try removing discover from capabilities section in meltano.yml and run infer_schema separately (don't know how you can do this from Meltano...you might need to manually run tap-rest-api) to generate the schema and catalog files before running the sync in Meltano?

jlloyd-widen commented 3 years ago

I just chatted with the folks at Meltano. This is definitely a compatibility issue. Currently Meltano defaults to running discovery before running a separate command with the provided args. I'll be filing an issue with them to change this default behavior.

I tried manually generating the schema and catalog files. I first removed the - discover from capabilities. Then I ran meltano invoke tap-rest-api --infer_schema. The first error I got was Applying catalog rules failed: catalog file is missing. which I thought was odd because I thought --infer_schema was supposed to generate that. Regardless I got passed that by also removing the - catalog capability.

I ran meltano invoke tap-rest-api --infer_schema again. This time it gave me this error: CRITICAL local variable 'end_from_config' referenced before assignment. Some feedback on this, my endpoint doesn't need any time or index information to be provided. Fortunately, my endpoint doesn't still respond even if you do provide it.

So I added index_key and end_index to my config with some dummy values and then ran meltano invoke tap-rest-api --infer_schema again. This generated both the catalog and schema.