Samita53 / ldspider

Automatically exported from code.google.com/p/ldspider
0 stars 0 forks source link

Weird behaviour with Content-Location header field #16

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Crawl "http://www.w3.org/2002/07/owl"

What is the expected output? What do you see instead?

 - "http://www.w3.org/2002/07/owl" has Content-Location of "owl.rdf"
 - context for quads from this document uses <http://www.w3.org/2002/07/owl>
 - a redirect is output from <http://www.w3.org/2002/07/owl> to <http://www.w3.org/2002/07/owl.rdf>

Please use labels and text to provide additional information.

 - behaviour is strange since we now have contexts which are the source of a redirect... there are various dangling redirects now.

(Found through problems ranking BTC11 where links are rewritten according to 
redirects, causing mis-alignment with contexts.)

Original issue reported on code.google.com by aidan.ho...@deri.org on 1 Nov 2011 at 3:23

GoogleCodeExporter commented 9 years ago
Two possible fixes I guess:

1. Use the Context-Location field to write the context
2. Omit the redirect.

Don't know which one is preferable. I've gone for 2. to clean up the BTC11 
redirects.

Original comment by aidan.ho...@deri.org on 1 Nov 2011 at 4:06