CodeForPhilly / cfp-data-pipeline

7 stars 3 forks source link

github tap failed #48

Open machow opened 3 years ago

machow commented 3 years ago

@dherbst, I'm not seeing a lot in the log--any thoughts on what might be causing it. Does the bot user need a specific kind of access to each of these repos (besides read?)

Log below

``` *** Reading local file: /usr/local/airflow/logs/meltano/github-to-postgres/2021-02-15T00:00:00+00:00/2.log [2021-02-16 22:04:00,790] {taskinstance.py:826} INFO - Dependencies all met for [2021-02-16 22:04:00,819] {taskinstance.py:826} INFO - Dependencies all met for [2021-02-16 22:04:00,819] {taskinstance.py:1017} INFO - -------------------------------------------------------------------------------- [2021-02-16 22:04:00,819] {taskinstance.py:1018} INFO - Starting attempt 2 of 2 [2021-02-16 22:04:00,820] {taskinstance.py:1019} INFO - -------------------------------------------------------------------------------- [2021-02-16 22:04:00,852] {taskinstance.py:1038} INFO - Executing on 2021-02-15T00:00:00+00:00 [2021-02-16 22:04:00,857] {standard_task_runner.py:51} INFO - Started process 3811 to run task [2021-02-16 22:04:00,873] {standard_task_runner.py:75} INFO - Running: ['airflow', 'tasks', 'run', 'meltano', 'github-to-postgres', '2021-02-15T00:00:00+00:00', '--job-id', '12', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/meltano.py', '--cfg-path', '/tmp/tmp35wd3sgw'] [2021-02-16 22:04:00,878] {standard_task_runner.py:76} INFO - Job 12: Subtask github-to-postgres [2021-02-16 22:04:00,979] {logging_mixin.py:103} INFO - Running on host 6e6178ea0d91 [2021-02-16 22:04:01,066] {taskinstance.py:1230} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_EMAIL=mchow@codeforphilly.org AIRFLOW_CTX_DAG_OWNER=Michael Chow AIRFLOW_CTX_DAG_ID=meltano AIRFLOW_CTX_TASK_ID=github-to-postgres AIRFLOW_CTX_EXECUTION_DATE=2021-02-15T00:00:00+00:00 AIRFLOW_CTX_DAG_RUN_ID=scheduled__2021-02-15T00:00:00+00:00 [2021-02-16 22:04:01,067] {bash.py:135} INFO - Tmp dir root location: /tmp [2021-02-16 22:04:01,068] {bash.py:158} INFO - Running command: /usr/local/venv/meltano/bin/meltano elt tap-github target-postgres-github --job_id=github-to-postgres [2021-02-16 22:04:01,082] {bash.py:169} INFO - Output: [2021-02-16 22:04:05,023] {bash.py:173} INFO - meltano | Running extract & load... [2021-02-16 22:04:05,365] {bash.py:173} INFO - meltano | No state was found, complete import. [2021-02-16 22:04:06,285] {bash.py:173} INFO - tap-github | INFO Starting sync of repository: CodeforPhilly/After_School_Wiki [2021-02-16 22:04:06,382] {bash.py:173} INFO - tap-github | INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.08558869361877441, "tags": {"endpoint": "milestones", "status": "failed"}} [2021-02-16 22:04:06,382] {bash.py:173} INFO - tap-github | INFO METRIC: {"type": "counter", "metric": "record_count", "value": 0, "tags": {"endpoint": "issue_milestones"}} [2021-02-16 22:04:06,383] {bash.py:173} INFO - tap-github | CRITICAL {"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#list-milestones"} [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | Traceback (most recent call last): [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/bin/tap-github", line 8, in [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | sys.exit(main()) [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/singer/utils.py", line 229, in wrapped [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | return fnc(*args, **kwargs) [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 1388, in main [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | do_sync(args.config, args.state, catalog) [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 1357, in do_sync [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | state = sync_func(stream_schema, repo, state, mdata) [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 492, in get_all_issue_milestones [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | for response in authed_get_all_pages( [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 148, in authed_get_all_pages [2021-02-16 22:04:06,385] {bash.py:173} INFO - tap-github | r = authed_get(source, url, headers) [2021-02-16 22:04:06,386] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/backoff/_sync.py", line 94, in retry [2021-02-16 22:04:06,386] {bash.py:173} INFO - tap-github | ret = target(*args, **kwargs) [2021-02-16 22:04:06,386] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 140, in authed_get [2021-02-16 22:04:06,386] {bash.py:173} INFO - tap-github | raise NotFoundException(resp.text) [2021-02-16 22:04:06,386] {bash.py:173} INFO - tap-github | tap_github.NotFoundException: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#list-milestones"} [2021-02-16 22:04:06,397] {bash.py:173} INFO - target-postgres-github | INFO PostgresTarget created with established connection: `user=postgres password=xxx dbname=datawarehouse host=postgres port=5432 sslmode=prefer sslcert=~/.postgresql/postgresql.crt sslkey=~/.postgresql/postgresql.key sslrootcert=~/.postgresql/root.crt sslcrl=~/.postgresql/root.crl`, PostgreSQL schema: `tap_github` [2021-02-16 22:04:06,401] {bash.py:173} INFO - target-postgres-github | INFO Sending version information to singer.io. To disable sending anonymous usage data, set the config parameter "disable_collection" to true [2021-02-16 22:04:06,556] {bash.py:173} INFO - meltano | Extraction failed (1): tap_github.NotFoundException: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/reference/issues#list-milestones"} [2021-02-16 22:04:06,556] {bash.py:173} INFO - meltano | ELT could not be completed: Extractor failed [2021-02-16 22:04:06,589] {bash.py:173} INFO - ELT could not be completed: Extractor failed [2021-02-16 22:04:06,960] {bash.py:177} INFO - Command exited with return code 1 [2021-02-16 22:04:06,978] {taskinstance.py:1396} ERROR - Bash command failed. The command returned a non-zero exit code. Traceback (most recent call last): File "/opt/venv/reticulate/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/opt/venv/reticulate/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/opt/venv/reticulate/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task result = task_copy.execute(context=context) File "/opt/venv/reticulate/lib/python3.8/site-packages/airflow/operators/bash.py", line 180, in execute raise AirflowException('Bash command failed. The command returned a non-zero exit code.') airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code. [2021-02-16 22:04:06,981] {taskinstance.py:1433} INFO - Marking task as FAILED. dag_id=meltano, task_id=github-to-postgres, execution_date=20210215T000000, start_date=20210216T220400, end_date=20210216T220406 [2021-02-16 22:04:07,033] {local_task_job.py:118} INFO - Task exited with return code 1 ```
dherbst commented 3 years ago

In this case, it looks like https://github.com/CodeforPhilly/After_School_Wiki has either been deleted, or this is a private repo. We probably have to make a first pass at verifying the github repos do exist.

dherbst commented 3 years ago

@machow this PR will fix the list of repos - we only want the github ones. https://github.com/CodeForPhilly/cfp-data-pipeline/pull/49

machow commented 3 years ago

Couple updates from trying to run on staging--

portion of log:

``` [2021-02-22 19:51:31,654] {bash.py:173} INFO - tap-github | CRITICAL { [2021-02-22 19:51:31,654] {bash.py:173} INFO - tap-github | "documentation_url": "https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#abuse-rate-limits", [2021-02-22 19:51:31,654] {bash.py:173} INFO - tap-github | "message": "You have triggered an abuse detection mechanism. Please wait a few minutes before you try again." [2021-02-22 19:51:31,654] {bash.py:173} INFO - tap-github | } [2021-02-22 19:51:31,654] {bash.py:173} INFO - tap-github | [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | Traceback (most recent call last): [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/bin/tap-github", line 8, in [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | sys.exit(main()) [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/singer/utils.py", line 229, in wrapped [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | return fnc(*args, **kwargs) [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 1388, in main [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | do_sync(args.config, args.state, catalog) [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 1357, in do_sync [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | state = sync_func(stream_schema, repo, state, mdata) [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 1113, in get_all_commits [2021-02-22 19:51:31,655] {bash.py:173} INFO - tap-github | for response in authed_get_all_pages( [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 148, in authed_get_all_pages [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | r = authed_get(source, url, headers) [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/backoff/_sync.py", line 94, in retry [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | ret = target(*args, **kwargs) [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | File "/usr/local/meltano/cfp-pipeline/.meltano/extractors/tap-github/venv/lib/python3.8/site-packages/tap_github/__init__.py", line 138, in authed_get [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | raise AuthException(resp.text) [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | tap_github.AuthException: { [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | "documentation_url": "https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#abuse-rate-limits", [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | "message": "You have triggered an abuse detection mechanism. Please wait a few minutes before you try again." [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | } [2021-02-22 19:51:31,656] {bash.py:173} INFO - tap-github | ```
machow commented 3 years ago

github tap seems to work on re-run, so as long as the API doesn't complain about rate limit abuse, should be fine!? Will double check and then close!