Closed danielweck closed 3 years ago
Note: search would work if the Url was using query params (e.g. https://api.deslibris.ca/api/feed/search?q={searchTerms}).
But as OpenSearch allows a url form that does not use query params, Thorium's code must be adapted to this use case.
Technical notes:
OpenSearch URL Template mechanism: https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md
... is not RFC 6570 URI Template! (different parsing grammar, semantics):
https://tools.ietf.org/html/rfc6570
Feedbooks example:
https://catalog.feedbooks.com/opensearch.xml
=>
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Feedbooks</ShortName>
<Description>Search on Feedbooks</Description>
<InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding>
<Image type="image/x-icon" width="16" height="16">http://www.feedbooks.com/favicon.ico</Image>
<Url type="text/html" template="https://catalog.feedbooks.com/search.html?query={searchTerms}"/>
<Url type="application/atom+xml" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Url type="application/atom+xml;profile=opds-catalog;kind=acquisition" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Query role="example" searchTerms="robot" />
</OpenSearchDescription>
=>
https://catalog.feedbooks.com/search.atom?query={searchTerms}
versus:
https://catalog.feedbooks.com/catalog/index.json
=>
{
"metadata":{"title":"Feedbooks"},
"links":[
{"type":"application/opds+json","rel":"self","href":"https://catalog.feedbooks.com/catalog/index.json"},
{"type":"application/opds+json","rel":"search","href":"https://catalog.feedbooks.com/search.json{?query}","templated":true}
...
=>
https://catalog.feedbooks.com/search.json{?query}
Current code is naïve (but somewhat reasonable) search+replace:
Related issue: https://github.com/edrlab/thorium-reader/issues/1382
Note: search would work if the Url was using query params
Are we sure about that? Or is this just conjecture?
I think it is a timeout issue (Saga race condition). Additional HTTP request to OpenSearch XML succeeds, but too late.
Actually, not a timeout issue. This fails:
with:
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>desLibris</ShortName>
<Description>Search on desLibris</Description>
<InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding>
<Image type="image/x-icon" width="16" height="16">https://deslibris.ca/favicon.ico</Image>
<Url type="application/atom+xml" template="https://api.deslibris.ca/api/feed/search/{searchTerms}"/>
<Url type="application/atom+xml;profile=opds-catalog;kind=acquisition" template="https://api.deslibris.ca/api/feed/search/{searchTerms}"/>
<Query role="example" searchTerms="robot" />
</OpenSearchDescription>
...but succeeds with:
<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Feedbooks</ShortName>
<Description>Search on Feedbooks</Description>
<InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding>
<Image type="image/x-icon" width="16" height="16">http://www.feedbooks.com/favicon.ico</Image>
<Url type="text/html" template="https://catalog.feedbooks.com/search.html?query={searchTerms}"/>
<Url type="application/atom+xml" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Url type="application/atom+xml;profile=opds-catalog;kind=acquisition" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Query role="example" searchTerms="robot" />
</OpenSearchDescription>
I ran an Electron Fiddle ( https://www.electronjs.org/fiddle ) with this renderer process code:
const xmlSrc1 = `<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>desLibris</ShortName>
<Description>Search on desLibris</Description>
<InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding>
<Image type="image/x-icon" width="16" height="16">https://deslibris.ca/favicon.ico</Image>
<Url type="application/atom+xml" template="https://api.deslibris.ca/api/feed/search/{searchTerms}"/>
<Url type="application/atom+xml;profile=opds-catalog;kind=acquisition" template="https://api.deslibris.ca/api/feed/search/{searchTerms}"/>
<Query role="example" searchTerms="robot" />
</OpenSearchDescription>`;
const xmlSrc2 = `<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Feedbooks</ShortName>
<Description>Search on Feedbooks</Description>
<InputEncoding>UTF-8</InputEncoding>
<OutputEncoding>UTF-8</OutputEncoding>
<Image type="image/x-icon" width="16" height="16">http://www.feedbooks.com/favicon.ico</Image>
<Url type="text/html" template="https://catalog.feedbooks.com/search.html?query={searchTerms}"/>
<Url type="application/atom+xml" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Url type="application/atom+xml;profile=opds-catalog;kind=acquisition" template="https://catalog.feedbooks.com/search.atom?query={searchTerms}"/>
<Query role="example" searchTerms="robot" />
</OpenSearchDescription>`;
const xmlDom1 = (new DOMParser()).parseFromString(xmlSrc1, "application/xml");
console.log(xmlDom1);
const urls1 = xmlDom1.documentElement.querySelectorAll("Url");
console.log(JSON.stringify(urls1, null, 4));
const xmlDom2 = (new DOMParser()).parseFromString(xmlSrc2, "application/xml");
console.log(xmlDom2);
const urls2 = xmlDom2.documentElement.querySelectorAll("Url");
console.log(JSON.stringify(urls2, null, 4));
...and everything works fine.
?!
Ah, got it! (classic silent XML parsing error with DOMParser)
error on line 1 at column 6: XML declaration allowed only at the start of the document
UTF8 BOM issue, or bad encoding, I think
YES :(
Buffer.from(searchRaw).toString("hex")
=>
3c3f786d6c2076657273696f6e3d22312e302220656e636f64696e673d225554462d38223f3e0a3c4f70656e5365617263684465736372697074696f6e20786d6
...but there is a efbbbf
prefix for desLibris, but not Feedbooks
I'm fixing this now.
The main feed also has a BOM, but we use xmldom
to parse in the main process, not DOMParser
(Chromium), so that's fine.
curl -s https://api.deslibris.ca/api/feed | hexdump | head
=>
0000000 ef bb bf 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e
curl -s https://api.deslibris.ca/opensearch-feed.xml | hexdump | head
=>
0000000 ef bb bf 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e
Compare with Feedbooks:
curl -s https://catalog.feedbooks.com/catalog/index.atom | hexdump | head
=>
0000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31
curl -s https://catalog.feedbooks.com/opensearch.xml | hexdump | head
=>
0000000 3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31
Will be fixed by https://github.com/edrlab/thorium-reader/pull/1385
https://api.deslibris.ca/api/feed
=>
...
https://api.deslibris.ca/opensearch-feed.xml
=>
...
https://api.deslibris.ca/api/feed/search/{searchTerms}