Wonong / ab-metadata-pusher

Amundsen databuilder Based containerized metadata pusher
Apache License 2.0
1 stars 0 forks source link

Interesting project! #8

Open feng-tao opened 3 years ago

feng-tao commented 3 years ago

hey,

Just run into your project, great work! A few questions:

  1. How do you read from the message bus (e.g kinesis, kafka) and persist those info into Amundsen graph? sorry if it is recorded somewhere
  2. any interest to upstream the kinesis publisher back to amundsen :)? thanks
Wonong commented 3 years ago

Thank you for your attention

  1. I made subscriber container(private) which subscribe messages from message queue periodically using crontab or cronjob. Job of subscriber container is almost same with other sample jobs of amundsendatabuilder. The only difference is extractor which extract data from message queue, not dbms. So, it extracts metadata from message queue and load it to neo4j and elasticsearch

  2. Publisher I implemented in this repo is AWS SQS publisher, not kinesis. If you want it, I gladly upstream it back to amundsen :)

feng-tao commented 3 years ago

thanks , I think both AWS SQS publisher and Message Queue extractor would be a great addition to amundsendatabuilder upstream as it adds a new push pattern to current databuilder ETL like what you describe in the readme. Let me know anything I could help or ping me for review if you have it. Thanks.

Wonong commented 3 years ago

Thnak you :) When it is ready to create a PR of extractor and publisher, I will do it. It will not be so long to do that. Also, I will ask for your help if needed ;)

feng-tao commented 3 years ago

thanks

feng-tao commented 3 years ago

also could you add your company to amundsen user list (https://github.com/amundsen-io/amundsen#who-uses-amundsen) if possible?

Wonong commented 3 years ago

Sorry but I am currently not using amundsen... This repo is just my personal concern about data discovery

feng-tao commented 3 years ago

no worry, thanks, looking forward to your prs

Wonong commented 3 years ago

@feng-tao Checking my subscriber implementation again, I found that messages from the message queue were being fetched by the publisher, not extractor. Is that okay to upstream???

feng-tao commented 3 years ago

@Wonong do you think you could have two separate prs, one for https://github.com/Wonong/ab-metadata-pusher/blob/main/publisher/aws_sqs_csv_puiblisher.py , and one for the subscriber? I would like to take a look first.

Wonong commented 3 years ago

I mean my answer of above 1. is wrong Okay. I will make two separate PRs and mention you :)

feng-tao commented 3 years ago

sg

Wonong commented 3 years ago

Sorry to say this. I think I need a more time...

I just realize that I was misunderstanding Job class of amundsendatabuilder so far. When I executed subscriber container to subscribe messages from message queue(AWS SQS), I used modified databuilder which is modified to run a job without task. so, I only developed publisher and run a job without task before.

But, job of amundsendatabuilder must have task as you know.(I just realized this.) so, I need to update my subscriber code. also I am busy this and next week for other works. so, I think I can complete those things in feburary. 😢