amir-jakoby / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 0 forks source link

Fix deprecation in Crawler Commons Code #13

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. I switched javac deprecation on be default which enables us to spot 
potential areas for improvements. If you compile the code, you will see that 
there are a few. 

We can work on the instances of deprecation post 0.2 release. 

Original issue reported on code.google.com by lewis.mc...@gmail.com on 28 Jan 2013 at 2:52

GoogleCodeExporter commented 8 years ago
Hi Lewis - I've made a pass through fixing up issues related to robots.txt 
parsing, and I think the code is in good shape for use in Nutch (is Nutch 
currently using this code?), as we now support patterns, the longest-match 
heuristic, and the allow before disallow heuristic.

So a release might be in order, but wondering if you wanted to take a pass at 
fixing deprecations first.

Original comment by kkrugler...@transpac.com on 14 Mar 2014 at 12:07

GoogleCodeExporter commented 8 years ago
@Chris, yes I will submit a patch for deprecation and we can push a release.
Thanks for nudging this one.
p.s. yes we are using robotos.txt parsing code in Nutch. It works a treat :)

Original comment by lewis.mc...@gmail.com on 16 Mar 2014 at 1:33

GoogleCodeExporter commented 8 years ago
I am working on this patch. It also comprises CC-8 issue so I am making the 
upgrade to httpclient as per Fuad's patch.
I have some failing tests locally after the upgrade and addressing other javac 
warning so I'll work on this again tomorrow when I get a chance and submit a 
patch. 

Original comment by lewis.mc...@gmail.com on 16 Mar 2014 at 7:36

GoogleCodeExporter commented 8 years ago
OK folks here is a patch for this issue.
It is a rather confusing patch as it contains a few things. So to break it down 
it comprises the following
* Slight code reformatting in pom.xml, removal of unused Hadoop log property, 
removal of unused ant-eclipse-jvm1.2 plugin configuration
* Integration of issue CC-8 which now upgrades out httpclient API usage to 
v4.2.6. Having reviewed this patch, I now feel that we have retained as much of 
the existing functionality as possible however using the new API it seems like 
a lot of change, it is not as bad as it initially seems.
* Remove all unused imports across the codebase
* Suppress all Javac warning's with the appropriate entires across the entire 
codebase
* Changed access of CrawlerCommons.getVersion() to static. 
* Reformatted ALL files mentioned above for better readability in IDE.   

The patch attachment can be applied to trunk and passes all tests. 
No new tests have been introduced to the codebase in this patch.

Original comment by lewis.mc...@gmail.com on 16 Mar 2014 at 9:33

Attachments:

GoogleCodeExporter commented 8 years ago
Merged as of r118.

The formatting changes are open for discussion - some people really prefer 4 
spaces over 2 :)

Original comment by kkrugler...@transpac.com on 17 Mar 2014 at 12:38

GoogleCodeExporter commented 8 years ago
Guys, please remember to update CHANGES.txt prior to committing something. This 
will make it easier to track changes from one version to the other.

Original comment by digitalpebble on 19 Mar 2014 at 10:58

GoogleCodeExporter commented 8 years ago
Hi Julien
+ 1
Update made to CHANGES.txt as of revision 119 in trunk.
Thanks

Original comment by lewis.mc...@gmail.com on 19 Mar 2014 at 7:15