Closed mazznoer closed 2 years ago
x:replace-nodes()
was introduced with xidel-0.9.9.20201125.7684.
Using version 0.9.9 is working now, but with some other html input it failed.
$ xidel --version
Xidel 0.9.9
(20210818.8090.c8e45f7fe96e)
http://www.videlibri.de/xidel.html
by Benito van der Zander <benito@benibela.de>
$ xidel file.txt -s --html -e 'x:replace-nodes(//script,())' --color never
<!DOCTYPE html>
Error:
err:XQDY0025: Duplicate attribute: title
in TXQTermConstructorComputed
Possible backtrace:
$00000000005421E2: perhaps TXQTermConstructor + 11986 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
$000000000053A0A1: perhaps TXQTermBinaryOp + 3153 ? but unlikely
$000000000053F473: TXQTermConstructor + 355
Call xidel with --trace-stack to get an actual backtrace
Try deleting nodes on html input that does not contain the node, return just the doctype. A bug?
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>Test</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body {
background: #fff;
color: #333;
}
</style>
</head>
<body>
<div id='main'></div>
</body>
</html>
$ xidel test.html -s --html -e 'x:replace-nodes(//script,())' --color never
<!DOCTYPE html>
$ xidel file.txt -s --html -e 'x:replace-nodes(//script,())' --color never
What's the content of 'file.txt'?
Try deleting nodes on html input that does not contain the node, return just the doctype. A bug?
Why would you want to try to remove a non-existing node? Anyway...
https://www.benibela.de/documentation/internettools/xpath-functions.html#x-replace-nodes:
Currently it is implemented trivially by calling x:transform on the document and filtering for $nodes.
I'm not sure how transform()
is called (in the background) when doing x:replace-nodes(//script,())
, because transform(/,function($x){if (name($x)="script") then () else $x})
has a different outcome. Whether it's a bug or not, Benito would have to answer.
What's the content of 'file.txt'?
Here is minimal code for testing.
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>Test</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<img src="cat.jpg" title="cat" title="cat">
<script></script>
</body>
</html>
err:XQDY0025: Duplicate attribute: title
The cause of this error is in the error message actually.
Why would you want to try to remove a non-existing node?
Just for testing.
<img src="cat.jpg" title="cat" title="cat">
I don't know what the origin is of this minimal code, but a duplicate title attribute (as the errors already mentions) is invalid. Remove one.
That is the unfortunate combination of a lousy HTML parser with a very spec conformant XQuery processor
The HTML parser should remove one of the title attributes, but it does not, so the document is invalid.
replace-nodes
- written in XQuery - cannot output an invalid document. You could do x:replace-nodes(//@title, ())
to remove them
Try deleting nodes on html input that does not contain the node, return just the doctype. A bug?
There is a three argument version of the function for this case, x:replace-nodes(/, //script, ())
, which has been recently added
I'm not sure how transform() is called (in the background) when doing x:replace-nodes(//script,()), because transform(/,function($x){if (name($x)="script") then () else $x}) has a different outcome. Whether it's a bug or not, Benito would have to answer.
The two argument version x:replace-nodes(//script, ())
, calls the three arg version similarly to x:replace-nodes(root((//script)[1]), //script, ())
,
That way, when the nodes exists, it returns the correct document, when multiple documents are loaded
I have fixed it by porting that function from XQuery to Pascal
It takes over 200 lines of Pascal do the same as 13 lines of XQuery, but it is also much faster now
I want to remove all scripts from html but not succed.
Thanks for this useful tool.