Open feng-tao opened 3 years ago
Thank you for your attention
I made subscriber container(private) which subscribe messages from message queue periodically using crontab or cronjob. Job of subscriber container is almost same with other sample jobs of amundsendatabuilder. The only difference is extractor which extract data from message queue, not dbms. So, it extracts metadata from message queue and load it to neo4j and elasticsearch
Publisher I implemented in this repo is AWS SQS publisher, not kinesis. If you want it, I gladly upstream it back to amundsen :)
thanks , I think both AWS SQS publisher
and Message Queue extractor would be a great addition to amundsendatabuilder upstream as it adds a new push pattern to current databuilder ETL like what you describe in the readme. Let me know anything I could help or ping me for review if you have it. Thanks.
Thnak you :) When it is ready to create a PR of extractor and publisher, I will do it. It will not be so long to do that. Also, I will ask for your help if needed ;)
thanks
also could you add your company to amundsen user list (https://github.com/amundsen-io/amundsen#who-uses-amundsen) if possible?
Sorry but I am currently not using amundsen... This repo is just my personal concern about data discovery
no worry, thanks, looking forward to your prs
@feng-tao Checking my subscriber implementation again, I found that messages from the message queue were being fetched by the publisher, not extractor. Is that okay to upstream???
@Wonong do you think you could have two separate prs, one for https://github.com/Wonong/ab-metadata-pusher/blob/main/publisher/aws_sqs_csv_puiblisher.py , and one for the subscriber? I would like to take a look first.
I mean my answer of above 1. is wrong
Okay. I will make two separate PRs and mention you :)
sg
Sorry to say this. I think I need a more time...
I just realize that I was misunderstanding Job class of amundsendatabuilder so far. When I executed subscriber container to subscribe messages from message queue(AWS SQS), I used modified databuilder which is modified to run a job without task. so, I only developed publisher and run a job without task before.
But, job of amundsendatabuilder must have task as you know.(I just realized this.) so, I need to update my subscriber code. also I am busy this and next week for other works. so, I think I can complete those things in feburary. 😢
hey,
Just run into your project, great work! A few questions: