Open sho-suzuki opened 6 years ago
I'm not sure that your problem is caused by the proxy but could you try the following command?
$ curl -H "Authorization: token <token>" "http://localhost:8080/gitbucket/api/v3/repos/<user name>/<repository name>/contents/<file name>?ref=<commit hash>&large_file=true"
The value <token>
is the one generated by GitBucket here.
The value <commit hash>
is b7d5e8b5fba9a7927ff2b5106066e790ad2ced4e
in your case.
It can be obtained by:
$ curl -H "Authorization: token <token>" "http://localhost:8080/gitbucket/api/v3/repos/<user name>/<repository name>/git/refs/heads/master
If you want to learn how Fess gets files more, see GitBucketDataStoreImpl.java.
thanks @kw-udon. I got a response when I submitted a command you pointed out.
# curl -H "Authorization: token 284530a64e55176f9ed9*********" "http://gitbucket:8080/gitbucket/api/v3/repos/root/name/contents/hoge?ref=efcd9adbec49f73f762b7b2127153593024e4bea&large_file=true"
{"type":"file","name":"hoge","path":"hoge","sha":"efcd9adbec49f73f762b7b2127153593024e4bea","content":"IyBBcHAgYXJ0aWZhY3RzCi9fYnVpbGQKLLmV4cw==","encoding":"base64","download_url":"http://gitbucket:8080/gitbucket/api/v3/repos/root/name/raw/efcd9adbec49f73f762b7b2127153593024e4bea/hoge"}
so proxy didn't discard request and refused.
MultipleCrawlingAccessException
is occured in your log file, but I don't know what can raise this exception.
Do you have any idea @marevol?
Connection time out(Connect to gitbucket:8080 [gitbucket/IP] failed: Connection refused (Connection refused)):
The cause is above. It's a network problem. I think that the problem is a proxy setting or the like.
@marevol @kw-udon There is only one crawler that crawls gitbucket. How do I get detailed logs to execute curl request when crawling starts?
@marevol thanks! I set the crawl log level info to debug, fess-crawler.log is as follows.
2018-02-15 14:15:37,744 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Accessing http://gitbucket:8080/gitbucket/api/v3/repos/user/repo/contents/hoge?ref=37cce0819cdf0a357e0b5e9bc373030dbfa84cd6&large_file=true
2018-02-15 14:15:37,745 [5DFNjmEBO7Desvq7XhyO-1] DEBUG CookieSpec selected: default
2018-02-15 14:15:37,746 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection request: [route: {}->http://gitbucket:8080][total kept alive: 0; route allocated: 0 of 20; total allocated: 0 of 200]
2018-02-15 14:15:37,746 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection leased: [id: 1][route: {}->http://gitbucket:8080][total kept alive: 0; route allocated: 1 of 20; total allocated: 1 of 200]
2018-02-15 14:15:37,746 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Opening connection {}->http://gitbucket:8080
2018-02-15 14:15:37,746 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connecting to gitbucket/IP:8080
2018-02-15 14:15:37,747 [5DFNjmEBO7Desvq7XhyO-1] DEBUG http-outgoing-1: Shutdown connection
2018-02-15 14:15:37,747 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection discarded
2018-02-15 14:15:37,748 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection released: [id: 1][route: {}->http://gitbucket:8080][total kept alive: 0; route allocated: 0 of 20; total allocated: 0 of 200]
2018-02-15 14:15:37,748 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Cancelling request execution
2018-02-15 14:15:37,748 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Failed to access to http://gitbucket:8080/gitbucket/api/v3/repos/user/repo/contents/hoge?ref=37cce0819cdf0a357e0b5e9bc373030dbfa84cd6&large_file=true
org.codelibs.fess.crawler.exception.CrawlingAccessException: Connection time out(Connect to gitbucket:8080 [gitbucket/IP] failed: Connection refused (Connection refused)): http://gitbucket:8080/gitbucket/api/v3/repos/user/repo/contents/hoge?ref=37cce0819cdf0a357e0b5e9bc373030dbfa84cd6&large_file=true
at org.codelibs.fess.crawler.client.http.HcHttpClient.processHttpMethod(HcHttpClient.java:820) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.crawler.client.http.HcHttpClient.doHttpMethod(HcHttpClient.java:623) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.crawler.client.http.HcHttpClient.doGet(HcHttpClient.java:582) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.crawler.client.AbstractCrawlerClient.execute(AbstractCrawlerClient.java:142) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.crawler.client.FaultTolerantClient.execute(FaultTolerantClient.java:67) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.helper.DocumentHelper.processRequest(DocumentHelper.java:148) ~[classes/:?]
at org.codelibs.fess.ds.impl.GitBucketDataStoreImpl.storeFileContent(GitBucketDataStoreImpl.java:291) ~[classes/:?]
at org.codelibs.fess.ds.impl.GitBucketDataStoreImpl.lambda$storeData$4713(GitBucketDataStoreImpl.java:134) ~[classes/:?]
at org.codelibs.fess.ds.impl.GitBucketDataStoreImpl.crawlFileContents(GitBucketDataStoreImpl.java:441) [classes/:?]
at org.codelibs.fess.ds.impl.GitBucketDataStoreImpl.crawlFileContents(GitBucketDataStoreImpl.java:447) [classes/:?]
at org.codelibs.fess.ds.impl.GitBucketDataStoreImpl.storeData(GitBucketDataStoreImpl.java:124) [classes/:?]
at org.codelibs.fess.ds.impl.AbstractDataStoreImpl.store(AbstractDataStoreImpl.java:106) [classes/:?]
at org.codelibs.fess.helper.DataIndexHelper$DataCrawlingThread.process(DataIndexHelper.java:236) [classes/:?]
at org.codelibs.fess.helper.DataIndexHelper$DataCrawlingThread.run(DataIndexHelper.java:222) [classes/:?]
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to gitbucket:8080 [gitbucket/IP] failed: Connection refused (Connection refused)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:159) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.4.jar:4.5.4]
2018-02-15 14:15:37,747 [5DFNjmEBO7Desvq7XhyO-1] DEBUG http-outgoing-1: Shutdown connection
2018-02-15 14:15:37,747 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection discarded
2018-02-15 14:15:37,748 [5DFNjmEBO7Desvq7XhyO-1] DEBUG Connection released: [id: 1][route: {}->http://gitbucket:8080][total kept alive: 0; route allocated: 0 of 2
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_161]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_161]
at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:75) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.4.jar:4.5.4]
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.4.jar:4.5.4]
at org.codelibs.fess.crawler.client.http.HcHttpClient.executeHttpClient(HcHttpClient.java:852) ~[fess-crawler-2.0.1.jar:?]
at org.codelibs.fess.crawler.client.http.HcHttpClient.processHttpMethod(HcHttpClient.java:660) ~[fess-crawler-2.0.1.jar:?]
... 13 more
...
2018-02-15 14:15:42,103 [CoreLib-TimeoutManager] DEBUG Closing expired connections
2018-02-15 14:15:42,105 [CoreLib-TimeoutManager] DEBUG Closing connections idle longer than 60000 MILLISECONDS
From this log connection appears to be disconnected by connection timeout or connection refused. and I also changed gitbucket logback-setting.xml like this, but no application log found.
<configuration debug="true" scan="true" scanPeriod="60 seconds">
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<!-- encoders are by default assigned the type
ch.qos.logback.classic.encoder.PatternLayoutEncoder -->
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern> %date %-4relative [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<appender name="ROLLING" class="ch.qos.logback.core.rolling.RollingFileAppender">
<!-- encoders are by default assigned the type
ch.qos.logback.classic.encoder.PatternLayoutEncoder -->
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!-- rollover daily and compress-->
<fileNamePattern>/gitbucket/log/gitbucket-%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
<!-- compressed logs are remains 30 days and then deleted -->
<maxHistory>30</maxHistory>
<timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
<maxFileSize>25MB</maxFileSize>
</timeBasedFileNamingAndTriggeringPolicy>
</rollingPolicy>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%d{HH:mm:ss.SSS} %-4relative [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<root level="DEBUG">
<appender-ref ref="STDOUT"/>
<appender-ref ref="ROLLING"/>
</root>
</configuration>
any ideas?
Did you configure proxy settings? See https://github.com/codelibs/fess/issues/1066
@marevol yes. I configured proxy setting in fess_config.properties
http.proxy.host=proxy_IP
http.proxy.port=proxy_port
http.proxy.username=
http.proxy.password=
plugin version
1.3.1
gitbucket version
4.20
what is matter
under the proxy environment . I can't get content from files but can get issue, wikis. fess-crawler.log is as follows,
On Linux, both requests seem to return the same result.
I think that it may be a problem in setting proxy. (Proxy discards file request) I would like to know about the http request of the file crawl API.
thanks.