buda-base / lds-pdi

http://purl.bdrc.io BDRC Linked Data Server
Apache License 2.0
2 stars 0 forks source link

limit big queries / crawling #30

Closed eroux closed 6 years ago

eroux commented 6 years ago

Lds-pdi on aws hardly works at the moment, I'm trying to figure out why, but it may be helpful to:

xristy commented 6 years ago

I just restarted ldspdi on aws and it is responsive again. Fuseki was fine so the problem seems to be w/ ldspdi.

Looking in the logs I don't see google or bing bots but I do see some accesses that are probing the server:

127.0.0.1 - - [01/Mar/2018:14:20:36 +0000] "GET /query/Work_outline?R_id=bdr:W12827 HTTP/1.0" 200 11220 127.0.0.1 - - [01/Mar/2018:14:21:32 +0000] "GET /query/Item_basicInfo?R_RES=bdr:I29329_I001&jsonOut HTTP/1.0" 200 567 127.0.0.1 - - [01/Mar/2018:14:22:02 +0000] "GET /resource/P3393 HTTP/1.0" 200 1158 127.0.0.1 - - [01/Mar/2018:14:22:05 +0000] "GET /muieblackcat HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:05 +0000] "GET /phpMyAdmin/scripts/setup.php HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:05 +0000] "GET /phpmyadmin/scripts/setup.php HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:06 +0000] "GET /pma/scripts/setup.php HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:06 +0000] "GET /myadmin/scripts/setup.php HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:06 +0000] "GET /MyAdmin/scripts/setup.php HTTP/1.0" 404 93 127.0.0.1 - - [01/Mar/2018:14:22:20 +0000] "GET /query/Item_basicInfo?R_RES=I22084_I001&jsonOut HTTP/1.0" 200 9385

I suspect a memory leak or cache misbehavior

eroux commented 6 years ago

Something else that appears:

01-Mar-2018 14:18:49.080 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@6b8a2626]) and a value of type [org.glassfish.jersey.internal.Errors] (value [org.glassfish.jersey.internal.Errors@6662ac45]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
MarcAgate commented 6 years ago

robots.txt contains the following:

User-Agent: * Disallow: /

MarcAgate commented 6 years ago

All done in commit 3498dd9 (robots.txt) and 39eeb6a (limit to 500 results)