Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

SNAPSHOT in stable release 2.2.1? #161

Closed leonardsaers closed 9 years ago

leonardsaers commented 9 years ago

I''m trying to build norconex-collector-http stable release 2.2.1 with a simple "mvn clean install -DskipTests".

The build fails because it times out while downloading

org.apache.pdfbox:pdfbox:jar:2.0.0-SNAPSHOT
org.apache.pdfbox:fontbox:jar:2.0.0-SNAPSHOT

The maven log message where it times out:

Downloading: http://www.mygrid.org.uk/maven/repository/org/apache/pdfbox/fontbox/2.0.0-SNAPSHOT/maven-metadata.xml

I can't find those dependencies in the pom file, so I think it's strange that maven tries to download those. The dependency seams to come from norconex-collector-core

Is there SNAPSHOTS dependency to pdfbox? if so, is there a need to rely on the SNAPSHOT instead of the latest stable version.

essiembre commented 9 years ago

Those are indirect dependencies required by the Importer. On top of timeouts, you may get compile errors with 2.2.1 and PDFBox-SNAPSHOT. The distributed binaries of HTTP Collector have 100% working dependencies with it.

You are right though, the code for an official release should not point to a snapshot release of dependencies. This is not ideal, but in the case of PDFBox, an exception was made because its 2.0.0 snapshot branch addresses many defects that were reported with the latest 1.x stable release. It was decided the benefits outweighed the trouble that may arise from trying to compile a stable release if the snapshot changes too much (different issue than your timeout). I am looking forward to a stable PDFBox 2.0.0 though.

The timeout should only be temporary. It occurred to me before and was back up after a while.

FYI, with the 2.3.0-SNAPSHOT version of HTTP Collector you should not have compile errors.

leonardsaers commented 9 years ago

Thanks for the reply. Yes, sometimes there is only a SNAPSHOT which solve the problem.

I get the same problem with version 2.3.0-SNAPSHOT

[INFO] ------------------------------------------------------------------------
[INFO] Building Norconex HTTP Collector 2.3.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repository.apache.org/content/groups/snapshots/com/norconex/collectors/norconex-collector-core/1.3.0-SNAPSHOT/maven-metadata.xml
Downloading: https://repository.apache.org/content/groups/snapshots/com/norconex/jef/norconex-jef/4.0.7-SNAPSHOT/maven-metadata.xml
Downloading: https://repository.apache.org/content/groups/snapshots/com/norconex/commons/norconex-commons-lang/1.8.0-SNAPSHOT/maven-metadata.xml
Downloading: https://repository.apache.org/content/groups/snapshots/com/norconex/collectors/norconex-importer/2.4.0-SNAPSHOT/maven-metadata.xml
Downloading: http://www.mygrid.org.uk/maven/repository/org/apache/pdfbox/fontbox/2.0.0-SNAPSHOT/maven-metadata.xml
leonardsaers commented 9 years ago

Now it's working. No more time out :)

essiembre commented 9 years ago

Great. Hopefully this won't happen too often. If it happens again, you can add the -o or --offline flag to your Maven command to force it to use snapshot releases found in your local Maven repo instead of trying to check and download fresh ones.