PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

web robots can access downloads and example links #190

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Looking at the server access statistics (/log, etc), specifically for FILE 
category, and the metadata (logentity table), I concluded that robots do 
download all the data files and run example queries from time to time (e.g., 
such as the one from Mountain View, US, perhaps Google). Also, our robots.txt 
file is too trivial, and more importantly, does not work at all when cpath2 is 
deployed not as a web domain's root web app... Also, ";jsessionid=..." is 
sometimes part of the log record.

To tune, fix this, I am going to:
- remove robots.txt and the corresponding web controller;
- add either <meta name="robots" value="nofollow,noindex"> (for the /admin 
pages) or <a rel="nofollow" ..> (for file and example links).
- remove ";jsession..." ending from a file name before saving the download 
event to the log db.

Original issue reported on code.google.com by rod...@gmail.com on 5 Nov 2014 at 4:21

GoogleCodeExporter commented 9 years ago
Fixed in the cPath2 sources; - will be deployed on the test server at 
http://pathwaycommons.baderlab.org first

Original comment by rod...@gmail.com on 5 Nov 2014 at 4:28

GoogleCodeExporter commented 9 years ago

Original comment by rod...@gmail.com on 7 Nov 2014 at 10:23