ScaleUnlimited / flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons
Apache License 2.0
51 stars 18 forks source link

Support multi-threading during tests #41

Open kkrugler opened 7 years ago

kkrugler commented 7 years ago

Currently our static UrlLogger will have an issue if multiple tests that use it are run at the same time.

One option is to get the job name in any function that needs to log, and have it pass that in the logging request. Then the UrlLoggerImpl could segment results by this key (and provide a clear(key) call).

Each test that relies on logging would need to call UrlLogger.clear(test name), and ensure that the job being run sets the job name to be the test name.

Schmed commented 7 years ago

I poked at this a little. The problem is getting at the job name from within a function. We've got it in the CrawlTopology, and it gets passed to the ExecutionEnvironment, but it doesn't seem to be accessible from the ExecutionEnvironment, let alone anything actually accessible to a function.

I was hoping that we didn't have to add jobName as a parameter in the constructor of every single function we're using.