Closed FM1337 closed 4 years ago
I'll look into this. Thank you.
It's not
, it's the span
tag.
I ran the code myself, and the error logged onto the console was First child not a text node
, which makes sense as the first child of the tag a
(the span
tag) is an ElementNode
, and not TextNode
which causes an error to be thrown.
I'll be working around on this in the Text()
function to return TextNode
data even when they are siblings of ElementNode
s.
After the update, I started getting errors
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
2017/06/07 00:11:34 Error occurred in Text() : runtime error: invalid memory address or nil pointer dereference
Not sure why it's happening.
It seems it's returning k
's NextSibling
as nil
and then the code is trying to access k.Type
. I've redirected this to a custom panic in the latest commit. Though I'll be keeping this issue open to solve the real bug of traversing between elements in Text()
function.
Applied latest update:
2017/06/07 11:33:10 Error occurred in Text() : No text node found
2017/06/07 11:33:10 Error occurred in Text() : No text node found
2017/06/07 11:33:10 Error occurred in Text() : No text node found
2017/06/07 11:33:10 Error occurred in Text() : No text node found
2017/06/07 11:33:10 Error occurred in Text() : No text node found
2017/06/07 11:33:10 Error occurred in Text() : No text node found
So yeah I'm seeing the custom error.
It works to me. Example code:
package main
import (
"fmt"
"github.com/anaskhan96/soup"
)
func main() {
source := soup.HTMLParse(`<p class="block">
<a href="/article/today-on-the-bus-i-saw-my-ex-girlfriend-get-on-despite-several-seats-being-open-she-specifically_190836.html">
<span class="icon-piment"></span>
[Insert FML text here] FML
</a>
</p>`)
soup.SetDebug(true)
block := source.Find("p", "class", "block")
fmt.Println(block.Find("a").Text())
}
Output:
[Insert FML text here] FML
How can I reproduce the error? Also with the Get()
method on a real page I can't reproduce it.
In my case, error reproduce when <span>
does not have text inside. Like this <span></span>
and my guess, error not related to  , simply check   by compare with "\u00A0".
Have faced the similar issue, when any node is empty i.e <span></span>
or <td></td>
package main
import (
"github.com/anaskhan96/soup"
_ "github.com/anaskhan96/soup"
)
const test = `
<p class="block">
<a href="/article/today-on-the-bus-i-saw-my-ex-girlfriend-get-on-despite-several-seats-being-open-she-specifically_190836.html">
<span class="icon-piment"></span>
[Insert FML text here] FML
</a>
</p>
`
func main() {
actual := soup.HTMLParse(test).Find("p", "class", "block").Find("a").Text()
print(actual)
}
it returns [Insert FML text here] FML
also.
It's been a little over 3 years since this issue was opened and well over 2 since it went stale. Closing this, will reopen if the discussion/issue arises again.
An odd issue I'm having while trying to use soup to parse Fmylife's site for FMLs is when I get an FML that has the (&)nbsp; tag
when I try to call the text, it returns blank text and nothing else.
I usually call it using .Find("p", "class", "block").Find("a").Text() and if it doesn't have the whitespace tag, it returns fine.