Heliosearch / heliosearch

The next generation of open source search
http://heliosearch.org
91 stars 19 forks source link

Solr response logs are truncated after ~500 characters #26

Open smerchek opened 10 years ago

smerchek commented 10 years ago

It appears this change was made uniquely for heliosearch as part of the following commit. As we recently upgraded from Solr to Heliosearch, this was a surprise for us. We rely on the full list of parameters to exist in our logs so we get the full context of whats going on. In addition, we use tools like logstash to parse the logs and get a picture of what parameters are being used over time.

https://github.com/Heliosearch/heliosearch/commit/c7ca813b91a3d28c8ee38052f26598ffde368071#diff-06fb971739c227bdf836946417f4da9cR243

Was there a compelling reasons to do this? If so, would it be possible to make this configurable?

yonik commented 10 years ago

On Thu, Oct 16, 2014 at 3:02 PM, Scott Smerchek notifications@github.com wrote:

It appears this change was made uniquely for heliosearch as part of the following commit. As we recently upgraded from Solr to Heliosearch, this was a surprise for us. We rely on the full list of parameters to exist in our logs so we get the full context of whats going on. In addition, we use tools like logstash to parse the logs and get a picture of what parameters are being used over time.

c7ca813#diff-06fb971739c227bdf836946417f4da9cR243 https://github.com/Heliosearch/heliosearch/commit/c7ca813b91a3d28c8ee38052f26598ffde368071#diff-06fb971739c227bdf836946417f4da9cR243

Was there a compelling reasons to do this?

The reason was when I was doing the terms query implementation. For most people it's useless echoing back megabytes of input and takes a lot of processing/IO.

If so, would it be possible to make this configurable?

Certainly. How about a logLimit request parameter (or can you think of a better name)? You could set that as a default parameter on the request handler so it could be varied per-source or per-request even. That seems more flexible than a global config via solrconfig.xml...

-Yonik

smerchek commented 10 years ago

I can see how 1000s of terms being printed in the logs would be overkill. We would even probably ignore those, too.

Yes, I like the idea of a logLimit parameter. On the other hand, it's a little odd to affect logging through a request parameter. But I can definitely see where you would want more flexibility than just the solrconfig.

As for the truncating part, it should be careful to maintain the log format. For example, a non truncated log looks like

&rows=1} hits=184 status=0 QTime=25

but the truncated log looks like

&ro[...921 more chars]status=0 QTime=1

It should finish the } and space between status=, looking like:

&ro[...921 more chars]} status=0 QTime=1

yonik commented 10 years ago

OK, I've committed support for a request parameter called "logLimit" that limits the size of any given request parameter. This way, at least the start of all request parameters are still displayed. This also fixes the parse-ability issues you highlighted.

Example: curl "http://localhost:8983/solr/select?q=*:*&a=hello&b=this+is+a+big+string&logLimit=10" And the log output is: 639632 [qtp65374112-12] INFO org.apache.solr.core.SolrCore – [collection1] webapp=/solr path=/select params={logLimit=10&b=this+is+a+[...10+more+chars]&q=*:*&a=hello} hits=32

I also optimized the logging in a number of ways, including avoiding multiple string creations (just to pre-pend a "{" for example), and avoiding doing any work whatsoever if it won't be logged anyway.

A logLimit of -1 means "don't limit".

cholpas commented 9 years ago

Did you accidentally remove 'status' and 'QTime' from the logs with this commit? Because they are no longer contained in the log output, only 'hits' is present.

yonik commented 9 years ago

Hmmm, you're right... looks like they were accidentally removed.