amir-jakoby / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 0 forks source link

[Sitemaps] Sitemap Parser changes the processed flag unnecessarily #61

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Our SitemapParser has a parse method with the following signature:
public AbstractSiteMap parseSiteMap(String contentType, byte[] content, 
AbstractSiteMap sitemap);

As you can see, the caller sends a sitemap to this method.

This is the implementation:
AbstractSiteMap asmCopy =  parseSiteMap(contentType, content, sitemap.getUrl());
asmCopy.setLastModified(sitemap.getLastModified());
sitemap.setProcessed(true);        
return asmCopy;

please note the third line:
sitemap.setProcessed(true);

This is just plain wrong, this sitemap object wasn't processed!  why should we 
mark it as processed ?

After some ivestigation I found that this bug was introduced in revision 30.

To see the bug look at:
https://code.google.com/p/crawler-commons/source/diff?spec=svn30&r=30&format=sid
e&path=/trunk/src/main/java/crawlercommons/sitemaps/SiteMapParser.java

Line 61 in the original was wrongly refactored to line 63 in the new code there.

Original issue reported on code.google.com by avrah...@gmail.com on 9 Nov 2014 at 3:11

GoogleCodeExporter commented 8 years ago
I am not sure here, my understanding is that parseSiteMap does the processing. 
This is dependent upon the type of sitemap e.g. text, XML, etc. The 
sitemap.setProcessed method is then called within each of the parse method 
invocations.
I would need to debug if this is absolutely true. I'm working on some CC 
examples which I will upload a patch for soon enough. If you are able to debug 
this yourself then please provide more info Avi. Good catch here. 

Original comment by lewis.mc...@gmail.com on 9 Nov 2014 at 3:52

GoogleCodeExporter commented 8 years ago
Thank you Lewis.

I am quite sure that this line should be removed as the "setProcessed" flag is 
already set as you said.

but I will debug it and attach a patch.

Original comment by avrah...@gmail.com on 9 Nov 2014 at 4:06

GoogleCodeExporter commented 8 years ago
+1 Avi. This is what I think should happen as well. It seems unnecessary to 
introduce it again.

Original comment by lewis.mc...@gmail.com on 9 Nov 2014 at 4:09

GoogleCodeExporter commented 8 years ago
I have checked the parsing and processing methods and all of them set the 
processed flag as they should.

I have removed that line as it is wrong.

I am attaching a patch for code review.

Original comment by avrah...@gmail.com on 10 Nov 2014 at 10:11

Attachments:

GoogleCodeExporter commented 8 years ago
+ 1 Avi. Commit away.

Original comment by lewis.mc...@gmail.com on 24 Nov 2014 at 8:10

GoogleCodeExporter commented 8 years ago
Done on Rev: r154

Original comment by avrah...@gmail.com on 25 Nov 2014 at 12:07