Closed jornh closed 1 year ago
AWS could be common for deployment, possibly using https://aws.amazon.com/ecs/?
Install the Neo4j APOC plugin (in a folder next to your example/docker/neo4j/conf/
)
mkdir example/docker/neo4j/plugins
pushd example/docker/neo4j/plugins
wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.3.0.4/apoc-3.3.0.4-all.jar
popd
mkdir example/backup
Add volumes for plugins + backup in amundsen-docker.yml:
volumes:
- ./example/docker/neo4j/conf:/conf
- ./example/docker/neo4j/plugins:/plugins
- ./example/backup:/backup
Start containers,
Docker-compose -f docker-amundsen.yml up
ingest data via Databuilder
In the Amundsen frontend web, change descriptions. Maybe add owners…
In the Neo4j web console
CALL apoc.export.cypher.schema('/backup/amundsen_schema.cypher')
CALL apoc.export.graphml.all('/backup/amundsen_data.graphml', {useTypes: true, readLabels: true})
Delete the Neo4j graph (still in the Neo4j web console):
MATCH (n)
DETACH DELETE n
Restore the backup (yep, you guessed it, still in the Neo4j console) :
CALL apoc.import.graphml('/backup/amundsen_data.graphml', {useTypes: true, readLabels: true})
ToDo:
Figure out where CLI/cron job should live: as part of metadata - as shell/cron (wrap in airflow) - as Databuilder - as Airflow Operator
Test volume add works - does not break for non-existing plugin/backup in repo (or add KeepFolder file)
Check under what circumstances restore of Schema is needed
Related: #196 and slack thread with some script snippets etc
@ttannis we lost access to the useful content of former FE issue https://github.com/lyft/amundsenfrontendlibrary/issues/186 referenced in the snippet shown below.
Can that content be salvaged somehow? E.g. will transferring https://github.com/lyft/amundsenfrontendlibrary/issues/186 in a closed state to here do it?
Basic install of services (in different environments)
Docker-compose “vanilla”, but with Gunicorn, data in volumes etc.
AWS (ECS PR): lyft/amundsenfrontendlibrary#216 (or EC2): lyft/amundsenfrontendlibrary#186
Kubernetes (convert from Compose using https://kompose.io?)
Transferred that closed issue over: https://github.com/lyft/amundsen/issues/77
Thanks for the quick turnaround on this @ttannis - seems to work nicely!
Also please extend my thanks to other Lyft team members on the recent even higher systematic focus on grooming PRs etc. I think going forward that will really encourage more to hopefully contribute even more!
@jornh I have amundsen on aws eks + k8s + helm now; I will put up a PR next week with docs; I'm not sure if it will fully fulfill this story, or, if I should put up another one. wdyt?
Great @javamonkey79! I think it should definitely tick the Kubernetes box above (I edited a bit above).
Just push what you think is suitable to cover Kubernetes on it's own and we'll figure the rest out later, when there's some good pieces of content it's easy to shuffle around afterwards if needed.
Right now I'm thinking the list above should end up as just a jump list or "annotated ToC" for what the sys-admin would like/need to know. Haven't really figured out how much or little prose will be needed to glue it together... Thoughts are welcome! 😜
Ok @jornh I've got the PR up here; I've opted to not include the aws setup at this point, as it is tied to our org a bit. I might add it later, if there is enough interest. cc @markgrover @feng-tao
Great suggestions here! I'd like to emphasize the need for a more explicit documentation on how to set up Airflow to handle ingestions of ES after Neo4J editions. From an outsider perspective, it remains quite a mystery , although Airflow (or something filling this function) is clearly a 4th microservice indispensable for the other 3 to work.
@fBedecarrats Airflow has its own documentation. So we’ll probably just reference that.
But the gist of it is:
pip install
it on a box where it can run, probably in a Python virtual environment of its own for good measure.pip install amundsendatabuilder
and other required dependencies (database drivers) on top of your Airflow @jornh Thanks a ton for putting those instructions together - I'm currently investigating how to implement Amundsen and backup / restore was high on the list.
Is there a good way to have ElasticSearch re-index data that was restored into Neo4j? I'm getting search errors after a Neo4j database restore even though I see the expected data post-restore in the Neo4j console.
I found that I can re-run the amundsendatabuilder job on the same data source and the my restored data appears on the FE again, but that seems like a hackjob.
It’s merely a wishlist 🙂 (with links to “state of the union” - but luckily bit by bit I can tick boxes) glad to hear the list is useful to someone. So, thanks for your comment.
To answer your question: Elasticsearch and Amundsensearch doesn’t have a will of their own on what data to serve. So what you call a hack with re-ingesting reindexing through Databuilder is actually the way to update ES data. I think for a, hopefully rare, restore scenario that’s okay. Hope that clarifies...
Do you have ideas for a different way?
@jornh I have amundsen on aws eks + k8s + helm now; I will put up a PR next week with docs; I'm not sure if it will fully fulfill this story, or, if I should put up another one. wdyt?
I'd be interested in the helm chart.
@stewartbryson see https://www.amundsen.io/amundsen/k8s_install/ + the Amundsen Slack also has a #kube-helm channel for discussion
Thanks for the clarification, @jornh - if that's the best way to go about a restore scenario, then that works for us. :)
I honestly don't have any other ideas; I barely have the skillset to implement Amundsen, much less understand the inner workings 😅. Again, really appreciate the help and your documentation!
Hi, we have been trying to stand up Amundsen on Kubernetes but can't get the pod for Neo4j to deploy... Did anyone else have this problem?
I'm going to pick this up. I think this will be a nontrivial project, mostly in the form of soliciting feedback from the community. Part of the appeal of Amundsen is its flexibility: there's no one right way to install it. However, for a guide to be broadly useful, I believe it needs to have concrete steps. As a result, we'll need to make some opinionated decisions in order for the guide to be useful.
Here's how I'm planning on structuring this project:
If anyone has thoughts about this process, happy to hear.
There's some question as to which docs should be in the top repo vs service repos. My only strong feeling is that there be a single top-level doc that one can follow and find everything they need. Procedurally, it's much easier to make changes to the docs if they're all in one repo, rather than scattered between them. And given that the individual components aren't super useful when used independently, I default to just putting it into the larger repo. Open to feedback.
@dorianj that sounds like an awesome plan! I'll refrain from giving more feedback until you have passed step 1. 😉
hey -- we've packaged some of the learnings from this thread and other places into a recommended pathway https://medium.com/stemma/amundsen-deployment-best-practices-740a1800518e -- would love anyone who's worked through this stuff to try it out and give feedback, we'd like to eventually get this upstreamed into main repo once it's better battle tested
Hi, we finally decided to start working with Apache Atlas. I guess we'll consider later adopting Amundsen as an alternative front-end.
Does anyone use Ansible roles for deploying and managing Amundsen ? I could share mine if that is of any interest (on-premise compose installation).
@dorianj , could you Guide me on the installation of Amundsen without Docker ? docker being paid for the commercial use or require enterprise license would take the benefits of open source usage for the enterprises.
Any suggestions. Appreciate support here.
A year passed. Still even is not clear how to make auth.
Please add points on what you expect from such a guide in a comment below. I will then try to consolidate input and draft up an outline in this comment.
The guide can end up as
is/docs/deployment.md
/docs/owners_manual.md
better?Initial outline:
[ ] Basic install of services (in different environments)
data in volumes etc.(convert from Compose using https://kompose.io?)(upcoming PR see https://github.com/lyft/amundsen/issues/53#issuecomment-538575978 below)[ ] Setting up ingest (with or without Airflow, see https://github.com/lyft/amundsen/issues/53#issuecomment-617370073)
[ ] Configuration - custom build of frontend (to not have to maintain a fork we need to get https://github.com/lyft/amundsen/issues/408 transmogrified into proper documentation/tooling)
[ ] Security
[x] Backup - initial WiP in https://github.com/lyft/amundsen/issues/53#issuecomment-516159598 below ... current result in https://github.com/lyft/amundsen/issues/381#issuecomment-614534794 - and restore (on K8s) implemented in https://github.com/lyft/amundsen/pull/394
[ ] Monitoring (statsd etc.?)
[ ] Handling upgrades
[ ] ....