Class selector return duplicated node collection

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Load this : http://read.mangashare.com/20th-Century-Boys to a HtmlDocument
2. do this : doc.DocumentNode.QuerySelectorAll("table.datalist > tr.datarow")
3.or this : doc.DocumentNode.QuerySelectorAll(".datalist > .datarow")

What is the expected output? What do you see instead?
I expect to get a node collection without any duplication, because with just 
HTMLAgilityPack it does exactly that, although it's more difficult. I see a lot 
of duplicated node on the returned collection.

What version of the product are you using? On what operating system?
Fizzler 1.0.0, Win7 x64, VS 2010

Original issue reported on code.google.com by aprilka...@gmail.com on 18 Nov 2010 at 8:03

Attachments:

Untitled.png

GoogleCodeExporter commented 9 years ago

This issue has not been successfully reproduced. A simple and initial test in 
IronPython interpreter using Fizzler shows that node selection between 
HtmlAgilityPack and Fizzler produce identical count of nodes:

IronPython 2.6 (2.6.10920.0) on .NET 2.0.50727.4952
Type "help", "copyright", "credits" or "license" for more information.
>>> import clr
>>> clr.AddReference('HtmlAgilityPack')
>>> clr.AddReference('Fizzler')
>>> clr.AddReference('Fizzler.Systems.HtmlAgilityPack')
>>> from HtmlAgilityPack import HtmlDocument
>>> from Fizzler.Systems.HtmlAgilityPack.HtmlNodeSelection import *
>>> from System.Net import WebClient
>>> wc = WebClient()
>>> html = wc.DownloadString('http://read.mangashare.com/20th-Century-Boys')
>>> hd = HtmlDocument()
>>> hd.LoadHtml(html)
>>> doc = hd.DocumentNode
>>> rows = QuerySelectorAll(doc, '.datalist > .datarow')
>>> len(list(rows))
249
>>> doc.SelectNodes("//*[@class='datalist']/*[@class='datarow']").Count
249
>>>

If Fizzler was returning duplicates, its count be twice the number returned 
using XPath-based selection in HtmlAgilityPack. Do you see an oversight in the 
test or a misrepresentation of the problem?

Attached is the HTML source to http://read.mangashare.com/20th-Century-Boys at 
the time the test was conducted.

Original comment by azizatif on 18 Nov 2010 at 11:56

Attachments:

20th-Century-Boys.html

GoogleCodeExporter commented 9 years ago

Attached is the complete VS 2010 solution that I've been working on, maybe 
there's something wrong with my code or something, but I would be greatly 
appreciate it if you look at it.

Regards,

pilus

Original comment by aprilka...@gmail.com on 19 Nov 2010 at 5:39

Attachments:

Zeus.rar

GoogleCodeExporter commented 9 years ago

The problem is how your ParseHTML2 is written. Here is a proposed fix:

void ParseHTML2() {
  var doc = new Agi.HtmlDocument();
  doc.LoadHtml(textBox2.Text);
  var rows = from row in doc.DocumentNode
                            .QuerySelectorAll("table.datalist > tr.datarow")
             let cells = row.QuerySelectorAll("td").ToArray()
             select new object[] {
               /* Date      */ cells[0].InnerText,
               /* Chapter   */ cells[1].InnerText,
               /* Scanlator */ cells[2].InnerText,
               /* Link      */ cells[3].QuerySelector("a")
                                       .GetAttributeValue("href", null),
             };
  foreach(var row in rows)
    dataGridView1.Rows.Add(row);
}

Closing this issue as invalid because it reports a bug in user code, not in 
Fizzler.

Original comment by azizatif on 19 Nov 2010 at 7:33

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

oh ok then, I'll try it later, thx for all your reply then ... :D

Original comment by aprilka...@gmail.com on 19 Nov 2010 at 7:37

amoikevin / fizzler

Class selector return duplicated node collection #41