Open brennane opened 1 year ago
The extraction is incomplete. These errors will have no impact, the url will just be ignored.
It may be all the URLs scraped from the document. It will be confusing to an analyst why the URLs from some source document don't get indexed, since these errors are hidden from the web application. The error log showed something like 20 URLs being attempted in a large node-add operation.
storm> [ inet:url=https://www.example.com/hello ]
...........................
inet:url=https://www.example.com/hello
:base = https://www.example.com/hello
:fqdn = www.example.com
:params =
:path = /hello
:port = 443
:proto = https
.created = 2022/12/16 21:34:39.438
complete. 1 nodes in 59 ms (16/sec).
storm> [ inet:url=https://www.example.com/hello2, inet:url=https://www.example.com/hello3,world ]
...https://www.example.com/hello2, inet:url=https://www.exampl...
^
Syntax Error: Unexpected token ',' at line 1, column 43, expecting one of: (, ), *, +, +(, -, -(, ., :$, <(, ], absolute property name, relative property name, universal property
complete. 0 nodes in 8 ms (0/sec).
storm> inet:url
inet:url=https://www.example.com/hello
:base = https://www.example.com/hello
:fqdn = www.example.com
:params =
:path = /hello
:port = 443
:proto = https
.created = 2022/12/16 21:34:39.438
complete. 1 nodes in 1 ms (1000/sec).
this is for the synapse-cortex component. There is an issue with some HTML being invalid with storm language, here where a "," is in the URI:
needs to be