Closed lordravo closed 3 years ago
It could be the “classic gotcha” that you also need to ingest data into Elasticsearch through the databuilder.
Here’s a sample of doing that: https://github.com/lyft/amundsendatabuilder/blob/v1.5.1/example/scripts/sample_data_loader.py#L590
The architecture diagram and the Search and Databuilder sections in https://github.com/lyft/amundsen/blob/master/docs/architecture.md gives a good high level overview.
Hi @jornh, thanks for the help..
I added a create_es_publisher_sample_job
to the DAG, but even so all requests on Amunsen Frontend ends up with status 500.. It seems a configuration issue perhaps?
Event the /api/auth_user
endpoints returns:
{"msg":"Encountered exception: AUTH_USER_METHOD is not configured"}
And api/metadata/v0/get_last_indexed
:
{"msg":"Encountered exception: HTTPConnectionPool(host='amundsenmetadata', port=5000): Max retries exceeded with url: /latest_updated_ts (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3bf88ac080>: Failed to establish a new connection: [Errno -2] Name or service not known'))","timestamp":null}
I am attaching my code anyways... sample_amundsen_v2.zip
The amundsen_databuilder_table_metadata_job
task works properly, adding the metadata to Neo4J
The es_table_job
task endsup with success, bit I'm not sure how to check any effects other than trying to query on Amundsen Frontend.
es_table_job = PythonOperator(
task_id='es_table_job',
python_callable=create_es_publisher_sample_job,
provide_context=True,
op_kwargs={
'elasticsearch_index_alias': 'table_search_index',
'elasticsearch_doc_type_key': 'table',
'model_name': 'databuilder.models.table_elasticsearch_document.TableESDocument'
}
)
Any idea?
Edit: It seems on the Elasticsearch side, everything is fine. Looking into http://host:9200/tablese8214591-bd7b-4c2f-b6d3-2eabcc8a6aa2/_search, I was able to retrieve all metadata previously sent.
Digging into the docker-ecs-amundsen.yml, there is the following environment variables on the amundsenfrontend:
I wonder if it is related, since every search cames up with:
Encountered exception: HTTPConnectionPool(host='amundsensearch', port=5000): Max retries exceeded with url: /search?query_term=test&page_index=0 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f3bf8909278>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Edit2: Looking back to my launch script, I found out some odd warnings from ecs-cli compose
:
ecs-cli compose --cluster-config lg-amundsen --file docker-ecs-amundsen.yml up --create-log-groups --ecs-profile lg-amundsen [WARN] Skipping unsupported YAML option for service... [option name]=container_name [service name]=amundsenfrontend [WARN] Skipping unsupported YAML option for service... [option name]=depends_on [service name]=amundsenfrontend [WARN] Skipping unsupported YAML option for service... [option name]=container_name [service name]=neo4j [WARN] Skipping unsupported YAML option for service... [option name]=container_name [service name]=elasticsearch [WARN] Skipping unsupported YAML option for service... [option name]=container_name [service name]=amundsensearch [WARN] Skipping unsupported YAML option for service... [option name]=depends_on [service name]=amundsensearch [WARN] Skipping unsupported YAML option for service... [option name]=container_name [service name]=amundsenmetadata [WARN] Skipping unsupported YAML option for service... [option name]=depends_on [service name]=amundsenmetadata
It seems to have skipped yaml keys like: container_name
and depends_on
That seems bad
I'm pretty sure it is ECS related... I just launched Amundsen on a CentOS Instance, on GoogleCloud. Exact same yml (without the aws logs), and everything is working so far.
@lordravo hey i'm experiencing the same issue Did you come up with some solution except using CentOS?
running into the same issue, any luck with getting this to work on ECS?
I just followed the aws-ecs-deployment guide, and successfully launched the frontend, elasticsearch and neo4j endpoints.
After executing a dag with a BigQueryMetadataExtractor, I was also able to send metadata to Neo4j. As you can see:
But, for some reason, nothing is reachable through the amundsen frontend. Just a blank page:
Looking into the developer console there is a failed request: /api/search/v0/table?query=events&page_index=0
I am not sure if it is related, but the elasticsearch endpoint returns the following:
Any ideas why? Am I missing something?