Open gleitz opened 3 years ago
Hello, can you elaborate on this? I would like to work on this.
It seems to me pretty difficult to determine which coded blocks to show with rule-based approach. Larger doesn't mean better. Sometime people give "lengthy and not efficient answer" and "the shorter and cleaner answer" at the same time. Returning the larger code block might result in people getting "bad" code.
If you have new ideas about this topic, I would gladly hear about it.
Thanks for the help @ykskks. I'll explain the current logic and what we might do to improve it.
Today, we take the first answer and try to pull out any <pre>
or <code>
blocks. We prefer that order becausepre
is usually for multi-line content (the class Error
above) and code
is for inline content (true
in the example above). If we don't find any of those blocks we just return the entire answer.
I was thinking we could try and develop some heuristic of when to show the entire answer vs one (or more) of these pre
or code
blocks. The ideas above (how many code blocks the answer contains, the ratio of the length of the block to the overall number of characters in the answer, just dropping inline code
blocks, etc) were a starting point of how we might build that logic.
For what it's worth, I don't think showing the whole answer is that bad. It's probably better than extracting some code
block (true
) and returning that answer because it makes no sense.
A list was started with some problem queries and some thoughts on how we might build the answer selection logic. This list is a little old, so some of the answers are not the same as what they used to be, but feel free to edit that or use it for inspiration.
Thank you for detailed answer!
I will collect more data from current version of howdoi and see how I can improve it.
Should I first share ideas with you on Issues or submit PR directly?
Thanks! We can use this thread to post ideas - others can see it and help out too
But when you're ready to submit a PR that's great too because I haven't really thought through how the "rules" should be applied in a clear and extensible way. So even if we don't have the perfect set of rules we can at least have the scaffolding to apply them as they grow and change.
I tried some queries myself and found that only showing the first pre
is indeed not a great idea.
Rule-based approaches that you described might work, however it is hard to come up with them or actually evaluating them. Ideally we could collect data and use ML (I am thinking simple models like Logistic Regression with texts and possible hand-crafted features) to predict which one to return.
However, it needs labeling work, which most people don't want to do. So for now, below is my idea to somewhat mitigate the problem.
pre
exists → return pre
pre
exist → return all pre
with preceding sentence or paragraphcode
exists → sentence or paragraph with code
code
exist → all sentences or paragraphs with code
This way, we can cut out unnecessary parts and still return meaningful contents.
Consider the following:
I think that it is
true
because the variable was setvs.
We would want to return the larger code block
<code>
and only go with pre?