Closed davidemalagoli closed 8 years ago
No, I think currently we only use IdomaarSource in the Flume configs, either with an HTTPStream or FileStream reader. Also, S3 as source in Flume seems to be bleeding edge: https://issues.apache.org/jira/browse/FLUME-2437
@andras-sereny I've created a very simple implementation (using aws sdk and http protocol) to integrate the s3 source (see commit). Could you please give me a feedback on:
thanks
Hi @davidemalagoli , I've introduced the option in newsreel-test.sh to get data from S3: if the env variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set, the test script will try fetch the example file from S3.
The idomaar.sh start script passes the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from the original environment to the orchestrator environment, which then passes them on to any process it starts.
I'll do some work on this to make sure idomaar-demo.sh works with S3.
Perfect, let me know when you'll finish so I'll update wiki and documentation.
Thanks!
Hi @andras-sereny did you have time to complete tests?
Sorry, I was sick and out of office for rather long. I'll finish it this week.
Hi @davidemalagoli, as of 88f324def3c04296ae856415654f0a8282721c72 the idomaar-demo.sh can read data from S3 (f the env variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set), but on the aws branch it fails at spark the evaluation part:
INFO [datastream] File "/vagrant/evaluator/eval.py", line 34, in evalRecall INFO [datastream] GTList = set([k['object']['id'] for k in x['GT']['expected']['evidences']]) ERROR [datastream] TypeError: 'int' object has no attribute 'getitem'
I think that this is already implemented, right?