cern-sis / issues-scoap3

0 stars 0 forks source link

Airflow-dags: Scoap3 #331

Closed ErnestaP closed 3 weeks ago

ErnestaP commented 1 month ago

APS

Elsevier:

IOP

OUP:

SPRINGER:

HINDAWI:

Springer: sometimes cannot find the process dag in a dag bag; Screenshot-2024-05-28-at-15 43 26 Sometimes cannot push the record to Django API:

scoap3-springer-process-file-create-or-update-y80pwdz6
*** Found local files:
***   * /opt/airflow/logs/dag_id=scoap3_springer_process_file/run_id=springer__2024-05-28T13:46:04.067318+0000/task_id=create_or_update/attempt=1.log
[2024-05-28, 13:50:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [queued]>
[2024-05-28, 13:50:16 UTC] {taskinstance.py:1979} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [queued]>
[2024-05-28, 13:50:16 UTC] {taskinstance.py:2193} INFO - Starting attempt 1 of 1
[2024-05-28, 13:50:16 UTC] {taskinstance.py:2217} INFO - Executing <Task(_PythonDecoratedOperator): create_or_update> on 2024-05-28 13:46:04.116264+00:00
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:60} INFO - Started process 22 to run task
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:87} INFO - Running: ['airflow', 'tasks', 'run', 'scoap3_springer_process_file', 'create_or_update', 'springer__2024-05-28T13:46:04.067318+0000', '--job-id', '1618', '--raw', '--subdir', 'DAGS_FOLDER/scoap3/springer/springer_process_file.py', '--cfg-path', '/tmp/tmpb6rsqv9x']
[2024-05-28, 13:50:16 UTC] {standard_task_runner.py:88} INFO - Job 1618: Subtask create_or_update
[2024-05-28, 13:50:16 UTC] {task_command.py:423} INFO - Running <TaskInstance: scoap3_springer_process_file.create_or_update springer__2024-05-28T13:46:04.067318+0000 [running]> on host scoap3-springer-process-file-create-or-update-y80pwdz6
[2024-05-28, 13:50:16 UTC] {logging_mixin.py:188} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/template_rendering.py:46 AirflowProviderDeprecationWarning: This function is deprecated. Please use `create_unique_id`.
[2024-05-28, 13:50:16 UTC] {logging_mixin.py:188} WARNING - /home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/kubernetes_helper_functions.py:145 AirflowProviderDeprecationWarning: This function is deprecated. Please use `add_unique_suffix`.
[2024-05-28, 13:50:16 UTC] {pod_generator.py:555} WARNING - Model file /opt/airflow/pod_templates/pod_template_file.yaml does not exist
[2024-05-28, 13:50:17 UTC] {taskinstance.py:2513} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='scoap3_springer_process_file' AIRFLOW_CTX_TASK_ID='create_or_update' AIRFLOW_CTX_EXECUTION_DATE='2024-05-28T13:46:04.116264+00:00' AIRFLOW_CTX_TRY_NUMBER='1' AIRFLOW_CTX_DAG_RUN_ID='springer__2024-05-28T13:46:04.067318+0000'
[2024-05-28, 13:50:17 UTC] {logging_mixin.py:188} INFO - 2024-05-28 13:50:17 [info     ] Sending data to the backend    data={'dois': [{'value': '10.1140/epjc/s10052-024-12798-3'}], 'arxiv_eprints': [{'value': '2401.04587', 'categories': ['hep-ph']}], 'page_nr': [8], 'authors': [{'surname': 'Lin', 'given_names': 'Jia-Xin', 'affiliations': [{'value': 'School of Physics, Southeast University, Nanjing, 210094, China', 'organization': 'Southeast University', 'country': 'China'}], 'full_name': 'Lin, Jia-Xin'}, {'surname': 'Chen', 'given_names': 'Hua-Xing', 'email': 'hxchen@seu.edu.cn', 'affiliations': [{'value': 'School of Physics, Southeast University, Nanjing, 210094, China', 'organization': 'Southeast University', 'country': 'China'}], 'full_name': 'Chen, Hua-Xing'}, {'surname': 'Liang', 'given_names': 'Wei-Hong', 'email': 'liangwh@gxnu.edu.cn', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}], 'full_name': 'Liang, Wei-Hong'}, {'surname': 'Xiao', 'given_names': 'Chu-Wen', 'email': 'xiaochw@gxnu.edu.cn', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Guangxi Key Laboratory of Nuclear Physics and Technology, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}], 'full_name': 'Xiao, Chu-Wen'}, {'surname': 'Oset', 'given_names': 'Eulogio', 'email': 'oset@ific.uv.es', 'affiliations': [{'value': 'Department of Physics, Guangxi Normal University, Guilin, 541004, China', 'organization': 'Guangxi Normal University', 'country': 'China'}, {'value': 'Departamento de Física Teórica and IFIC, Centro Mixto Universidad de Valencia-CSIC Institutos de Investigación de Paterna, Aptdo. 22085, Valencia, 46071, Spain', 'organization': 'Centro Mixto Universidad de Valencia-CSIC Institutos de Investigación de Paterna', 'country': 'Spain'}], 'full_name': 'Oset, Eulogio'}], 'license': [{'url': 'https://creativecommons.org/licenses/by/4.0', 'license': 'CC-BY-4.0'}], 'collections': [{'primary': 'European Physical Journal C'}], 'files': {'pdfa': 'scoap3-dev-backend/media/harvested_files/10.1140/epjc/s10052-024-12798-3/10052_2024_Article_12798.pdf', 'xml': 'scoap3-dev-backend/media/harvested_files/10.1140/epjc/s10052-024-12798-3/10052_2024_Article_12798.xml.Meta.xml'}, 'publication_info': [{'journal_title': 'European Physical Journal C', 'journal_volume': '84', 'year': 2024, 'journal_issue': '4', 'artid': 's10052-024-12798-3', 'page_start': '1', 'page_end': '8', 'material': 'article'}], 'abstracts': [{'value': 'Starting from the molecular picture for the $$D_{s1}(2460)$$ and $$D_{s1}(2536)$$ resonances, which are dynamically generated by the interaction of coupled channels, the most important of which are the $$D^*K$$ for the $$D_{s1}(2460)$$ and $$DK^*$$ for the $$D_{s1}(2536)$$ , we evaluate the ratio of decay widths for the $$\\bar{B}_s^0 \\rightarrow D_{s1}(2460)^+ K^-$$ and $$\\bar{B}_s^0 \\rightarrow D_{s1}(2536)^+ K^-$$ decays, the latter of which has been recently investigated by the LHCb collaboration, and we obtain a ratio of the order of unity. The present results should provide an incentive for the related decay into the $$D_{s1}(2460)$$ resonance to be performed, which would provide valuable information on the nature of these two resonances.', 'source': 'Springer'}], 'acquisition_source': {'source': 'Springer', 'method': 'Springer', 'date': '2024-05-28T13:48:02.955902'}, 'copyright': [{'holder': 'The Author(s)', 'year': 2024}], 'imprints': [{'date': '2024-04-29', 'publisher': 'Springer'}], 'record_creation_date': '2024-05-28T13:48:02.955902', 'titles': [{'source': 'Springer'}]}
[2024-05-28, 13:50:17 UTC] {logging_mixin.py:188} INFO - 2024-05-28 13:50:17 [error    ] b'{"message":"null value in column \\"title\\" of relation \\"articles_article\\" violates not-null constraint\\nDETAIL:  Failing row contains (8283, null, null, 2024-04-29, null, null, , Starting from the molecular picture for the $$D_{s1}(2460)$$ and..., 2024-05-28 13:50:17.132762+00, 2024-05-28 13:50:17.132791+00).\\n"}'
[2024-05-28, 13:50:17 UTC] {taskinstance.py:2731} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 444, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 414, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/decorators/base.py", line 241, in execute
    return_value = super().execute(context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 200, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 217, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/opt/airflow/dags/repo/dags/scoap3/springer/springer_process_file.py", line 89, in create_or_update
    create_or_update_article(enriched_file)
  File "/home/airflow/.local/lib/python3.8/site-packages/backoff/_sync.py", line 105, in retry
    ret = target(*args, **kwargs)
  File "/opt/airflow/dags/repo/dags/scoap3/common/utils.py", line 278, in create_or_update_article
    response.raise_for_status()
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://backend.dev.scoap3.org/api/article-workflow-import/
[2024-05-28, 13:50:17 UTC] {taskinstance.py:1149} INFO - Marking task as FAILED. dag_id=scoap3_springer_process_file, task_id=create_or_update, execution_date=20240528T134604, start_date=20240528T135016, end_date=20240528T135017
[2024-05-28, 13:50:17 UTC] {standard_task_runner.py:107} ERROR - Failed to execute job 1618 for task create_or_update (400 Client Error: Bad Request for url: https://backend.dev.scoap3.org/api/article-workflow-import/; 22)
[2024-05-28, 13:50:17 UTC] {local_task_job_runner.py:234} INFO - Task exited with return code 1
[2024-05-28, 13:50:17 UTC] {taskinstance.py:3312} INFO - 0 downstream tasks scheduled from follow-on schedule check
ErnestaP commented 3 weeks ago

airflow was migrated back to a separate instance for each project