densitydesign / strumentalia-seealsology

see also section scraping on custom levels of depth
Other
83 stars 24 forks source link

Nodes with wrong level attribute when "parent links" option is selected #10

Closed Mitch90 closed 2 years ago

Mitch90 commented 7 years ago

Whenever a crawl is started with the parent link option selected, it looks like the level attribute assigned to each node is not always correct. For example starting from the Wikipedia page https://en.wikipedia.org/wiki/Privacy, the closest seealso link are correctly labeled level 1:

schermata 2017-01-30 alle 18 47 19

But so are this links, which are not:

schermata 2017-01-30 alle 18 35 57

You can also see this in the Gephi file exported from the crawl if I'm not mistaking, as there should only be dark blue (level 1 links) and orange (parent links) circles connected to the seed (privacy in this case):

schermata 2017-01-30 alle 18 46 28

The problem doesn't seem to appear with the parent links option disabled. I hope I explained myself clear enough

boogheta commented 5 years ago

Hi and sorry for answering this so late... I'm not sure to properly understand the issue, but I guess what you identified corresponds to nodes which are seen multiple times, first as a target (level1) to the seed (level0), and then as a parent (level0) to another target of the seed (level1) or as a parent (level1) of the target (level2) of a target (level1) of the seed. Not sure I'm clear enough either ;)