Closed LeMoussel closed 4 years ago
Hi! No, it's not possible for now. I tag it for a future release.
Hi Sylvain,
I'm french. Plus simple d'échanger en français .....
Une idée sur la date de la prochaine release intégrant cela ?
Dans l'attente , une solution de contournement est elle possible ? une relation entre chaque nœud ayant l’attribut type
positionné avec la valeur de chaque propriété de la relation ?
Comme il s'agit d'un site essentiellement lu par des anglophones, je vais continuer de faire l'effort d'écrire en anglais afin d'être compris par tous même si mon anglais est perfectible.
Next release is 2.0 available soon but without this enhancement. Unfortunatly, I can't say when this feature will be added right now. Is it critical for you?
Itou pour moi mon anglais est aussi perfectible.
Yes. As part of an R&D project this is essential for me. I think that given the power of Neo4j, this is an important feature. If need be, I can participate to the tests
It is possible to have this behaviour with two processes (or one but less readable configuration).
On the second configuration, you could configure the crawler with the following parts :
<importer>
<preParseHandlers>
<splitter class="com.norconex.importer.handler.splitter.impl.DOMSplitter"
selector="a"
parser="html"/>
<tagger class="com.norconex.importer.handler.tagger.impl.DOMTagger">
<restrictTo caseSensitive="false" field="document.reference">
.*#.*
</restrictTo>
<dom selector="a" toField="link_class" extract="attr(class)"/>
<dom selector="a" toField="link_url" extract="attr(href)"/>
<dom selector="a" toField="link_target" extract="attr(target)"/>
<dom selector="a" toField="link_text" extract="ownText"/>
</tagger>
<tagger class="com.norconex.importer.handler.tagger.impl.ConstantTagger"
onConflict="replace" >
<restrictTo caseSensitive="false" field="document.reference">
.*#.*
</restrictTo>
<constant name="TYPE">LINK</constant>
</tagger>
</preParseHandlers>
<postParseHandlers>
<filter class="com.norconex.importer.handler.filter.impl.RegexReferenceFilter" onMatch="include">
<regex>
.*#.*
</regex>
</filter>
</postParseHandlers>
</importer>
And the for the relationships configuration:
<relationships>
<relationship type="TO_PAGE" direction="OUTGOING" targetFindSyntax="MERGE">
<sourcePropertyKey label="LINK">link_url</sourcePropertyKey>
<targetPropertyKey label="Page">identity</targetPropertyKey>
</relationship>
<relationship type="FROM_PAGE" direction="OUTGOING" targetFindSyntax="MERGE">
<sourcePropertyKey label="LINK">link_url</sourcePropertyKey>
<targetPropertyKey label="Page">collector.referrer-reference</targetPropertyKey>
</relationship>
</relationships>
The main idea consists on splitting documents on each html tag.
The result looks like:
(:Page)<-[:FROM_PAGE]-(:LINK)-[:TO_PAGE]->(:Page)
Then, if you want to clean your graph and remove the LINK nodes you have to execute the following CYPHER query:
MATCH (p1:Page)<-[rFrom:FROM_PAGE]-(link:LINK)-[rTo:TO_PAGE]->(p2:Page)
MERGE (p1)-[r:LINKED_TO]->(p2)
SET r+= link
DETACH DELETE link
Does that help you?
Thanks for your help. Very well explained. I will test.
In HTML, for an anchor Tag (relationship) I want to store the
class
attribute (<a class="content-link" href="http://example.com">
).relationships Tag define relationships between nodes. How can I set Neo4j relation property
class
for relationships?