alexrs / pop

StackOverflow answers in your terminal.
MIT License
28 stars 0 forks source link

Better HTML parsing #2

Open alexrs opened 8 years ago

alexrs commented 8 years ago

PROBLEM: In the current implementation the answer is obtained as text but some answers contains links in the form of:

You can find more info here

where here is an hyperlink to some other resources, but it is not shown.

POSSIBLE SOLUTION: Get the answer as HTML, parse it, and display those links properly, for example:

You can find more info here (http://...)

Virtual-Machine commented 8 years ago

goquery has a utility function named OuterHtml() which gives the raw html for the selected node.

Activating this function with a query such as....

pop -l go print string

the following raw html is available.

<div class="post-text" itemprop="text">
<p><a href="http://golang.org/pkg/fmt/#Sprintf">Sprintf</a></p>

<p><a href="https://tour.golang.org/methods/19">Here also</a> is a use of it in the tutorial, &#34;A Tour of Go.&#34;</p>

<pre><code>return fmt.Sprintf(&#34;at %v, %s&#34;, e.When, e.What)
</code></pre>
</div>

Which is helpful to illustrate what you are referring to above.

I think that in terms of the hyperlinks, goquery could be utilized to grab the hyperlinks from the tags and append them as text to the inner html.

Virtual-Machine commented 8 years ago

Just quickly played around with goquery and came up with something like this...

doc.Find(".answercell .post-text").First().Find("a").Each(func(i int, s *goquery.Selection) {
    href, success := s.Attr("href")
    if success == true {
        href = " (" + href + ")"
        s.AppendHtml(href)
    }
})

Seems to perform as expected. Let me know what you think.

alexrs commented 8 years ago

I have merged your PR! If you have any other idea of how to improve Pop, I'll be happy to know it! I have also thought about highlighting in some way the titles (h1, h2, h3..) and some other HTML tags.