Open abalter opened 2 months ago
I wrote citeproc-js
, maybe I can help. First off, by "in-text" and "full," do you mean something like APA in the first case, and something like Chicago Manual footnote style in the second? Or does the first mean "in the document" and the second "in the bibliography"?
Hi @fbennett. Thanks for offering to help!
By "in-text" I mean inline citations, what is produced by Cite.format('citation', ...
in citation.js
. This is probably close to a "citation cluster". For example (Loomes, 2017, pp. 23-27).
By "full" I mean what would go into a bibliography or references list. I'm not sure if citeproc.makeBibliography(filter)
returns single full citations or only a full bibliography (the entire library). I don't know what the filter
variable is.
I'm wondering if a "citation cluster" is an in-text citation with one or more references?
I do like the objects returned by makeCitationCluster
(here) and makeBibliography
(here).
Now that I'm looking over the docs again, I might be grokking it better. This is what I think I'm seeing:
The citeproc
instance is initialized with a library of sources (CSL-JSON), and the ability to format citations and references in ANY style or locale specified. This is mediated by the sys
function that the user has to create.
I think this creates an enormous overhead for my needs. If I know my style and locale ahead of time and know I'm going to use those, I would like to be able to instantiate a library that can directly create citations without having to know how to go fetch styles and locales.
Maybe we could consider making the code more modular? Something I would be willing to help with.
Thanks. To answer the initial questions:
sys
helper functions are provided for your environment,
makeCitationCluster
, processCitationCluster
, and makeBibliography
should be able to do their business.Whether to use makeCitationCluster
or processCitationCluster
depends on
your requirements. The former will work if you have no need for
back-references and you are batch-processing the document (i.e. there's no
need for dynamic editing as in a word processor).
On Tue, Aug 27, 2024, 1:53 AM Ariel Balter @.***> wrote:
Hi @fbennett https://github.com/fbennett. Thanks for offering to help!
By "in-text" I mean inline citations, what is produced by Cite.format('citation', ... in citation.js. This is probably close to a "citation cluster". For example (Loomes, 2017, pp. 23-27).
By "full" I mean what would go into a bibliography or references list. I'm not sure if citeproc.makeBibliography(filter) returns single full citations or only a full bibliography (the entire library). I don't know what the filter variable is.
I'm wondering if a "citation cluster" is an in-text citation with one or more references?
I do like the objects returned by makeCitationCluster (here https://www.fidgetech.org/) and makeBibliography (here https://www.fidgetech.org/).
Now that I'm looking over the docs again, I might be grokking it better. This is what I think I'm seeing:
The citeproc instance is initialized a library of sources (CSL-JSON) and the ability to format citations and references in ANY style or locale specified. This is mediated by the "sys" function.
I think this creates an enormous overhead for my needs. If I know my style and locale ahead of time and know I'm going to use those, I would like to be able to instantiate a library that can directly create citations without having to know how to go fetch styles and locales.
Maybe we could consider making the code more modular? Something I would be willing to help with.
— Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/247#issuecomment-2310646158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSSLL6RKLSKIKCTTYTTZTNMSHAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGY2DMMJVHA . You are receiving this because you were mentioned.Message ID: @.***>
First let me say that I understand if this all sounds very critical. Just having CSL, Citeproc-JS, and Citation-JS is an amazing thing! I can see an immense amount of work went into creating the specs and writing the code. It's a huge boon to the academic world.
I do find both of the JS libraries to be quite difficult to use and the code looks like it could possibly be a lot simpler if it were modularized. For example, it would be fantastic if there was a single function that received a single source, style, and locale all as JSON or JavaScript objects and returned an inline citation. But that functionality appears to be entangled with other operations. Although I could be wrong about that.
Maybe what I'm actually suggesting is a feature request:
var citeproc = new CSL.DeclarativeEngine(style, lang);
Where style
and lang
are the actual CSL style and Locale as strings. Or URLs.
And, why not just default to the Citation Style Language style and locale specs?
Alternatively, a default sys
function that would work from strings or file URLs.
I guess I just need to write my own like this:
sys = {
fetchFile: async function(url) {
try {
const response = await fetch(url);
if (!response.ok) {
throw new Error('Network response was not ok');
}
const data = await response.text(); // or response.json(), response.blob() etc.
return data; // return the fetched data
} catch (error) {
console.error('There has been a problem with your fetch operation:', error);
}
},
loadLibrary: async function(library){
var library_data = await fetchFile(library);
this.library = JSON.parse(library_data);
},
retrieveItem: function(item_id){
item = this.library.items.find(x => x.id == item_id);
},
retrieveStyle: async function(style) {
// const url = `https://raw.githubusercontent.com/citation-style-language/styles/master/${style.csl_name}.csl`;
const url = `https://www.zotero.org/styles${style}`;
return await fetchFile(url);
},
retrieveLocale: async function(locale) {
const url = `https://raw.githubusercontent.com/citation-style-language/locales/master/locales-${locale}.xml`;
return await fetchFile(url);
}
}
I'd be happy to advise on a fork that aims to simplify or otherwise improve the code.
On Tue, Aug 27, 2024, 3:14 AM Ariel Balter @.***> wrote:
First let me say that I understand if this all sounds very critical. Just having CSL, Citeproc-JS, and Citation-JS is an amazing thing! I can see an immense amount of work went into creating the specs and writing the code. It's a huge boon to the academic world.
I do find both of the JS libraries to be quite difficult to use and the code looks like it could possibly be a lot simpler if it were modularized. For example, it would be fantastic if there was a single function that received a single source, style, and locale all as JSON or JavaScript objects and returned an inline citation. But that functionality appears to be entangled with other operations. Although I could be wrong about that.
Maybe what I'm actually suggesting is a feature request:
var citeproc = new CSL.DeclarativeEngine(style, lang);
Where style and lang are the actual CSL style and Locale as strings. Or URLs.
And, why not just default to the Citation Style Language style https://github.com/citation-style-language/styles and locale https://github.com/citation-style-language/locales specs?
Alternatively, a default sys function that would work from strings or file URLs.
I guess I just need to write my own like this:
sys = { fetchFile: async function(url) { try { const response = await fetch(url); if (!response.ok) { throw new Error('Network response was not ok'); } const data = await response.text(); // or response.json(), response.blob() etc. return data; // return the fetched data } catch (error) { console.error('There has been a problem with your fetch operation:', error); } },
loadLibrary: async function(library){ var library_data = await fetchFile(library); this.library = JSON.parse(library_data); }, retrieveItem: function(item_id){ item = this.library.items.find(x => x.id == item_id); }, retrieveStyle: async function(style) { // const url = `https://raw.githubusercontent.com/citation-style-language/styles/master/${style.csl_name}.csl` <https://raw.githubusercontent.com/citation-style-language/styles/master/$%7Bstyle.csl_name%7D.csl>; const url = `https://www.zotero.org/styles${style}`; return await fetchFile(url); }, retrieveLocale: async function(locale) { const url = `https://raw.githubusercontent.com/citation-style-language/locales/master/locales-${locale}.xml`; return await fetchFile(url); }}
— Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/247#issuecomment-2310786931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSXI3M64TUBW23SQQG3ZTNV6PAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQG44DMOJTGE . You are receiving this because you were mentioned.Message ID: @.***>
Ok. Before I jump in, just how rugged is it processing
source -----> citation
^
CSL
?
The XML has a lot of logic in it. Does ALL of that need to get parsed and recoded in javascript?
When the processor ingests a style, and the style is in XML format, it will convert it to JSON on the fly and read that into a JS object for processing. If you plan to convert styles to JSON externally, would you like me to identify the processor function(s) that perform the conversion? (I should know, but it's been a couple of years since I looked at the code, so I'd need to take a peek.)
On Tue, Aug 27, 2024, 4:35 AM Ariel Balter @.***> wrote:
Ok. Before I jump in, just how rugged is it processing
source -----> citation ^ CSL
?
The XML has a lot of logic in it. Does ALL of that need to get parsed and recoded in javascript?
— Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/247#issuecomment-2310934183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSUKVN7RGPCGM4XH3ODZTN7RVAVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQHEZTIMJYGM . You are receiving this because you were mentioned.Message ID: @.***>
Since there is so much logic in the stylesheets, really so much thought and work has gone into those, it seems to me like this is a perfect job for AI. I just handed chatGPT the "title" macro from chicago-author-date.csl and asked to to write it in javascript:
title macro
<macro name="title">
<choose>
<if variable="title" match="none">
<choose>
<if type="personal_communication speech thesis" match="none">
<text variable="genre" text-case="capitalize-first"/>
</if>
</choose>
</if>
<else-if type="bill book graphic legislation motion_picture song" match="any">
<text variable="title" text-case="title" font-style="italic"/>
<group prefix=" (" suffix=")" delimiter=" ">
<text term="version"/>
<text variable="version"/>
</group>
</else-if>
<else-if variable="reviewed-author">
<choose>
<if variable="reviewed-title">
<group delimiter=". ">
<text variable="title" text-case="title" quotes="true"/>
<group delimiter=", ">
<text variable="reviewed-title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</group>
</if>
<else>
<group delimiter=", ">
<text variable="title" text-case="title" font-style="italic" prefix="Review of "/>
<names variable="reviewed-author">
<label form="verb-short" text-case="lowercase" suffix=" "/>
<name and="text" delimiter=", "/>
</names>
</group>
</else>
</choose>
</else-if>
<else-if type="legal_case interview patent" match="any">
<text variable="title"/>
</else-if>
<else>
<text variable="title" text-case="title" quotes="true"/>
</else>
</choose>
</macro>
title.js
function formatTitle(citation) {
// Helper functions for text formatting
const capitalizeFirst = text => text.charAt(0).toUpperCase() + text.slice(1);
const toTitleCase = text => text.replace(/\w\S*/g, txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase());
const italicize = text => `<i>${text}</i>`;
const quote = text => `"${text}"`;
let result = "";
// Handling different conditions
if (!citation.title) {
if (!["personal_communication", "speech", "thesis"].includes(citation.type)) {
result = citation.genre ? capitalizeFirst(citation.genre) : "";
}
} else if (["bill", "book", "graphic", "legislation", "motion_picture", "song"].includes(citation.type)) {
result = italicize(toTitleCase(citation.title));
if (citation.version) {
result += ` (${citation.term_version || ''} ${citation.version})`;
}
} else if (citation.reviewed_author) {
if (citation.reviewed_title) {
result = `${quote(toTitleCase(citation.title))}. Review of ${italicize(toTitleCase(citation.reviewed_title))}, ${citation.reviewed_author.map(author => author.name).join(", ")}`;
} else {
result = `Review of ${italicize(toTitleCase(citation.title))}, ${citation.reviewed_author.map(author => `${author.label || ''} ${author.name}`).join(", ")}`;
}
} else if (["legal_case", "interview", "patent"].includes(citation.type)) {
result = citation.title;
} else {
result = quote(toTitleCase(citation.title));
}
return result;
}
No. Just no. If you're going to apply an LLM to this takes, you're on your own.
On Tue, Aug 27, 2024, 4:46 AM Ariel Balter @.***> wrote:
Since there is so much logic in the stylesheets, really so much thought and work has gone into those, it seems to me like this is a perfect job for AI. I just handed chatGPT the "title" macro from chicago-author-date.csl and asked to to write it in javascript:
title macro
title.js
function formatTitle(citation) { // Helper functions for text formatting const capitalizeFirst = text => text.charAt(0).toUpperCase() + text.slice(1); const toTitleCase = text => text.replace(/\w\S*/g, txt => txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase()); const italicize = text =>
<i>${text}</i>
; const quote = text =>"${text}"
;let result = ""; // Handling different conditions if (!citation.title) { if (!["personal_communication", "speech", "thesis"].includes(citation.type)) { result = citation.genre ? capitalizeFirst(citation.genre) : ""; } } else if (["bill", "book", "graphic", "legislation", "motion_picture", "song"].includes(citation.type)) { result = italicize(toTitleCase(citation.title)); if (citation.version) { result += ` (${citation.term_version || ''} ${citation.version})`; } } else if (citation.reviewed_author) { if (citation.reviewed_title) { result = `${quote(toTitleCase(citation.title))}. Review of ${italicize(toTitleCase(citation.reviewed_title))}, ${citation.reviewed_author.map(author => author.name).join(", ")}`; } else { result = `Review of ${italicize(toTitleCase(citation.title))}, ${citation.reviewed_author.map(author => `${author.label || ''} ${author.name}`).join(", ")}`; } } else if (["legal_case", "interview", "patent"].includes(citation.type)) { result = citation.title; } else { result = quote(toTitleCase(citation.title)); } return result;}
— Reply to this email directly, view it on GitHub https://github.com/Juris-M/citeproc-js/issues/247#issuecomment-2310954824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSSC43FGNHU6BAWW2T3ZTOA23AVCNFSM6AAAAABNDG3IB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQHE2TIOBSGQ . You are receiving this because you were mentioned.Message ID: @.***>
That's not my goal. I just thought I'd give it a try and see what happens. Not a good approach though, because then each style gets its own javascript which needs to be maintained.
I guess the goal is to write javascript that knows how to interpret and act on the logic in the macros.
Before I say this I just want to denounce LLMs as well. However, I've been thinking about "compiling" CSL into JS or other imperative languages as well, but programmatically of course. You'd need the appropriate helper functions, but it might lend to some interesting optimizations. Is that what you're after @abalter, or do you mean a single function that initializes citeproc to simplify the API?
I wasn't thinking about using LLMs the way I think you might be, anyway. I use them to help write code, do some of the dirty work. It's actually quite good at that. Of course, it's just a helper, so I double check everything.
That's all. I wasn't thinking: "hand this over to an AI".
I did a little exploring to understand the limits of XML and XSLT. In a perfect work, each bit of logic in the stylesheet should directly translate to a logical statement in another computer language. Thus, something like xsltproc
should be able to apply the logic in any well-formed stylesheet to any well-formed data. I guess it doesn't work like that.
My impression of the codebase is that interpreting and applying the logic in the stylesheets was pretty hellish. I see a lot of stuff that looks like trying to handle edge case after edge case. Is that just the way it is? Or would a fresh approach find common patterns and shortcuts?
I haven't studied a lot of CSL stylesheets yet to see if there are commonalities. I'm assuming each one has a few macros for handling authors, a few for titles, a few for publishers, etc. Maybe there is an ontology somewhere.
That's all. I wasn't thinking: "hand this over to an AI".
Sorry for my misinterpretation.
My impression of the codebase is that interpreting and applying the logic in the stylesheets was pretty hellish. I see a lot of stuff that looks like trying to handle edge case after edge case. Is that just the way it is? Or would a fresh approach find common patterns and shortcuts?
Just my perspective: I tried such a fresh approach a while back to get to know CSL a bit better and found that (1) the specification covers a lot of edge cases, so the actual behavior is sometimes a lot more complex that the XML itself suggests (e.g. handling of names, punctuation, indentation, suppression) and (2) citeproc-js has a lot of heuristics to be able to properly follow the specifications in the first place, and covers plenty more edge cases which didn't make it to the specifications. You can't easily get red of those and still get good results unless you keep to the most basic references.
I haven't studied a lot of CSL stylesheets yet to see if there are commonalities. I'm assuming each one has a few macros for handling authors, a few for titles, a few for publishers, etc. Maybe there is an ontology somewhere.
The macros can differ between styles, and as far as I know there are no guidelines.
@abalter: There are a couple of projects that might be of interest, given your objectives (apologies if you already know of these):
citeproc-js
with a tool superior in speed and code composition. The repo hasn't seen major code contributions in three years, but a recent pull request aims to get it working with more recent releases of Rust.
This is somewhat unfair to ask, but if someone can help me, it would mean a huge amount. The citeproc-js library is pretty complex and using it requires creating other functions (retrieveItem, retrieveLocale) that don't fully make sense to me. The package is designed to be able to do a large number of things across a large number of use cases.
All I want to do is generate formatted citations given a CSL-JSON library, CSL stylesheet, and locale spec like this:
Could someone guide me to the pertinent methods that I could use to build this simple application?