Norconex / collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
http://www.norconex.com/collectors/collector-filesystem/
21 stars 13 forks source link

Using log4j 2.x #61

Open robul opened 2 years ago

robul commented 2 years ago

Hello,

This is a question about the log4j version employed by the Filesystem Collector.

I notice that the most recent release of the Filesystem Collector (2.9.1 at time of writing) uses log4j version 1.2.17. I also note that the latest version of the HTTP collector (3.0.0 at time of writing) uses log4j version 2.17.1.

  1. Is there is a way to use a log4j 2.x version with Filesystem Collector 2.9.1?
  2. Do you know when there will be a Filesystem Collector release that ships with a log4j 2.x version?

Thank you!

essiembre commented 2 years ago

Since Log4j 2.x is not a drop-in replacement to Log4j v1.x, you can't easily upgrade it.

The Filesystem Collector will use Log4J 2.x when it is upgraded to use the more recent stack of shared libraries like the HTTP Collector recently was. We have not started on that yet.

Is Log4j v1.x causing you any problem?

robul commented 2 years ago

Thank you Pascal for the quick response!

It is not causing any specific problem - it is more just a case of keeping systems up to date.

robul commented 2 years ago

Hello Pascal,

Hope you are well!

Just wanted to ask a follow up question on this.

Do you have a rough timeline for the work of upgrading the Filesystem Collector to use a more recent stack of libraries - including Log4J 2.x - so that it is in line with the HTTP Collector?

Even a very approximate idea (e.g. which which year/quarter(s)) of when you'd expect this work to be progressed would be a huge help to us.

Thank you

essiembre commented 2 years ago

So far the plan is (likely matching HTTP Collector 3.1 release) to merge all our crawler-related repos into one big project with various modules. They will still be separate products/libraries binaries but they will be built together and all share the same versions going forward. The filesystem collector would then be upgraded to the new stack. That will facilitate maintenance and ensure all pieces shall always be in sync. There is no date for it, even though it would be nice to have it done this year.

jamessuart commented 1 year ago

Hello there... since it's many months since that previous post (and aspirational timeline!), we just wondered if there was a) any news/update on this thread, or b) any new timeline for when the above might happen? As Robin said previously, it's not causing huge problems per se, but it might (potentially) cause us to "fall foul" of our client's more stringent security policies in future... :)

Many thanks, James.

essiembre commented 1 year ago

No timeline. All I can say is the work has officially started and is well underway on merging repos and bringing the Filesystem Collector to parity with what we know now will be a "Version 4 stack". Hopefully, you'll start seeing snapshot/milestone releases before it becomes an issue for your client. In the meantime, if you know your Java, you can refactor the existing Filesystem collector code base to replace the log4j version with something else (and maybe share with a pull request).

jamessuart commented 1 year ago

Hi Pascal - actually that's a very useful update - thanks much. :)

Best James.

jamessuart commented 1 year ago

Hi Pascal - just checking in again to see if any further news on this from your side (refactor option notwithstanding)...? :)

Many thanks and best regards, James.

essiembre commented 1 year ago

No "news" other than we're workign on it. FS Collector has now been merged with the collector-http repo along with other projects under the upcoming-v4 branch. It will take quite some time before we encourage people to try it. That branch is quite unstable for now (as in it will likely change quite a bit before release).

So if you are expecting a release in the short term to make a decision, don't. :-)

That being said, have you considered "bridging" version 1 to use 2? Apache has created a bridge solution that involves swapping some Jars to redirect everything to Log4J2: https://logging.apache.org/log4j/2.x/log4j-1.2-api.html Have you given it a try?