atifaziz / Hazz

CSS Selectors (via Fizzler) for HtmlAgilityPack (HAP)
Other
63 stars 7 forks source link

QuerySelectorAll doesn't validate arguments eagerly #2

Closed atifaziz closed 6 years ago

atifaziz commented 6 years ago

From @raiytu4 on June 25, 2018 7:41

I demonstrate this problem here: https://github.com/raiytu4/GetEnumeratorOnFizzler

for short:

        static void Main(string[] args)
        {
            var doc = Browser.GetDoc("http://truyenfull.vn/dai-dao-trieu-thien/").DocumentNode;
            var categoryTextNodes = FindLiStoryDes(doc, "thể loại").QuerySelectorAll("div.cp2 > a");
            if (categoryTextNodes != null)
            {
                // categoryTextNodes is not null
                // it's ok to call .GetEnumerator()
                // but call categoryTextNodes.GetEnumerator().MoveNext() will throws get NullReferenceException
                categoryTextNodes.GetEnumerator();

                // throws NullReferenceException too!
                categoryTextNodes.Count();

            }
            Console.ReadLine();

        }

Copied from original issue: atifaziz/Fizzler#66

atifaziz commented 6 years ago

The problem is that your function FindLiStoryDes returns null and subsequently you call QuerySelectorAll on that null result. However, since QuerySelectorAll is a lazy function and doesn't validate its arguments eagerly, the NullReferenceException doesn't occur until much later, until when you call Count(). Now Count() is an eager function and because it consumes the entire sequence in order produce its result, it forces the full evaulation of QuerySelectorAll and therefore ends up throwing NullReferenceException.

The real issue here is that QuerySelectorAll should validate its arguments otherwise it can be misleading, as what has happened in this case.

Thanks for reporting this and I will try and fix it soon-ish. Meanwhlie, the workaround for you would be:

        static void Main(string[] args)
        {
            var doc = Browser.GetDoc("http://truyenfull.vn/dai-dao-trieu-thien/").DocumentNode;
            if (FindLiStoryDes(doc, "thể loại") is HtmlNode node)
            {
                var categoryTextNodes = node.QuerySelectorAll("div.cp2 > a");
                // categoryTextNodes is not null
                // it's ok to call .GetEnumerator()
                // but call categoryTextNodes.GetEnumerator().MoveNext() will throws get NullReferenceException
                categoryTextNodes.GetEnumerator();

                // throws NullReferenceException too!
                categoryTextNodes.Count();

            }
            Console.ReadLine();

        }
atifaziz commented 6 years ago

From @raiytu4 on June 26, 2018 0:29

Sorry, my bad, I forget to check null before call QuerySelectorAll var categoryTextNodes = FindLiStoryDes(doc, "thể loại").QuerySelectorAll("div.cp2 > a"); to var categoryTextNodes = FindLiStoryDes(doc, "thể loại")?.QuerySelectorAll("div.cp2 > a");

so, does calling QuerySelectorAll on null HtmlNode return null? or something else?

atifaziz commented 6 years ago

so, does calling QuerySelectorAll on null HtmlNode return null? or something else?

That will be disallowed. I intend to change QuerySelectorAll to throw ArgumentNullException.

atifaziz commented 6 years ago

This fix is now available in release 1.2.0.

ghost commented 6 years ago

thank! keep up the good work!