Open nmueller18 opened 3 years ago
I would like to see the Zenon-Database added to the possible sources. Taking especially the pubmed- and CrossRef-importers as template, this should not be too difficult. Each Zenon-entry is identified by a unique identifier, and this entry is accessible via a BibTeX-entry.
I agree, this should not be too difficult to achieve.
[...]
Because it is possible that more than one Zenon id is included in a reference (if this is part of another referenced item), the querySelector needs to cater for this possibility. Would something like that work:
const zenonid = el.querySelector('input[<a href="/Record/" class="]').value
?
The querySelector needs to receive a valid CSS selector. Briefly looking at the source code here, it looks like there are three links in any record:
<a href="/Record/000644412">...</a>
<a href=" /Author/Home?author=Rassmann%2C+Knut.">...</a>
<a href="/Record/001219271" class="title getFull" data-view="full">...</a>
It is the last record we want, right? In that case it is simple because it can be distinguished by it's class attribute like this:
const zenonLink = el.querySelector('a.getFull')
or, even better, to get the id from the link:
const zenonid = parseInt(el.querySelector('a.getFull').getAttribute('href').split('/').pop())
Then the rest of the record needs to be parsed to get the three components Author, Title and Published. This should be possible as there are lots of
<br/>
s and<a>
s. But I do not know how to modify the code snippetconst descriptionParts = el.innerHTML.split('<br>\n')[1].split(/ <b>\(|\)<\/b>\. /g)
. Why, for example, is the string split twice?
This has simply to do with the structure of the HTML used by one of the other citation database sites. The text wrangling will be very specific to every site (and will need to be updated once the website changes). In this case, I am guessing we need to fetch the author from the links leading to author pages. Those links have no special class, so instead we just need to filter through all included links in the entry, for example like this:
const authors = Array.from(el.querySelectorAll('a')).filter(a => a.getAttribute('href').includes('?author=')).map(a => a.innerText)
which will return:
["Rassmann, Knut."]
If we also want the period gone at the end, we could modify it like this:
Array.from(el.querySelectorAll('a')).filter(a => a.getAttribute('href').includes('?author=')).map(a => a.innerText.replace(/\.$/g,''))
which returns:
["Rassmann, Knut"]
I would like to see the Zenon-Database added to the possible sources. Taking especially the pubmed- and CrossRef-importers as template, this should not be too difficult. Each Zenon-entry is identified by a unique identifier, and this entry is accessible via a BibTeX-entry. I have modified the files
citation_api_import/index.js
andcitation_api_import/templates.js
accordingly and generated an additional filecitation_api_import/zenon.js
. However, at the moment I am struggling how to parse the records. An example output could look like that:Because it is possible that more than one Zenon id is included in a reference (if this is part of another referenced item), the querySelector needs to cater for this possibility. Would something like that work:
const zenonid = el.querySelector('input[<a href="/Record/" class="]').value
? Then the rest of the record needs to be parsed to get the three components Author, Title and Published. This should be possible as there are lots of<br/>
s and<a>
s. But I do not know how to modify the code snippetconst descriptionParts = el.innerHTML.split('<br>\n')[1].split(/ <b>\(|\)<\/b>\. /g)
. Why, for example, is the string split twice?