BeelGroup / Docear4Word

Source code of Docear4Word. See http://www.docear.org/software/add-ons/docear4word/overview/ for more details.
18 stars 7 forks source link

'Chicago' citation style does not work properly #17

Open Joeran opened 10 years ago

Joeran commented 10 years ago

There are a few issues with the popular Chicago citation style.

  1. We included the wrong one in the default list of citation styles that is installed with Docear4Word. Hence, remove the current style "Chicago Manual of Style (note, annotated bibliography)" from the default list of the styles that Docear4Word installs. Instead, add the style "Chicago Manual of Style 16th edition (author-date)" to the default list https://www.zotero.org/styles/chicago-author-date
  2. The Chicago style, but also some other styles (e.g. "American Institute of Physics"), spaces and words like "of" and "and" are removed from titles and journal names in the bibliography. This is not how it should be. image
hudcap commented 9 years ago

I believe the spaces are being removed due to an incorrect processing of the CSL attribute text-case="title"

I ran into this bug while customizing my own style and removing this attribute rendered the spaces correctly.

DaveLaur commented 9 years ago

I also came across this problem for a CSL file for modern-humanities-research-association.csl, so had a look at the source code. The bug fix is very simple...

On lines 12362 and 12364 of citeproc.js, a space is missing in the join argument string. They should be:

    lst[i] = words.join(" ");

doppel.string = lst.join(" ");

respectively. This file is found in the JavaScript directory under the Docear4word programme directory. Editing the Javascript file sorts the problem out without any recompilation of code required, so can be done by an end-user (though, obviously it would be better to get the source corrected - I'm new to Github so don't know how to do this)!

Hopefully someone who knows what they are doing can take this and (a) check that the correction is right, and (b) update the source files.

PLEASE NOTE - this correction is incomplete as it doesn't deal with the missing words problem! See below...

Joeran commented 9 years ago

hello,

we are not the authors of the citeproc.js. the citeproc project can be found here https://bitbucket.org/fbennett/citeproc-js/wiki/Home hence any changes should be suggested there (and i guess they might have done the changes already).

could you download the latest citeproc.js https://bitbucket.org/fbennett/citeproc-js/src/f73704ce6679dc03117b5c893a4fdbb1ea066e98/citeproc.js?at=default and replace the old one in the Docear directory with the new one? and let me know if this fixes the problems.

DaveLaur commented 9 years ago

I’m afraid not – it just breaks the whole thing – the docear4word add-in no longer works. (Is citeproc.js part of a package, and the other files would also be required)?

Also, quickly looking through the script, it has the same join lines in the “title” case formatting, so I suspect will do exactly the same thing on the formatting…

Dave.

Joeran commented 9 years ago

Is citeproc.js part of a package, and the other files would also be required

Not really. I don't know why the current citeproc.js causes such problems. Anyway, I created a FAQ entry with your proposed solution http://www.docear.org/faqs/why-are-spaces-removed-in-the-bibliography/

DaveLaur commented 9 years ago

The code makes assumptions about how Javascript splits strings with regular expressions. Having discussed this with Frank Bennett, the author of citeproc, it turns out that IE doesn't support the split() operator as per the standard. Whilst the standard says that the tokens being used as matches in the split are retained, IE throws them away. This means that words like "of" are stripped out, as are all spaces. Getting the code to work as written would require re-writing the split function, which is not something I'd proposed to do. An alternative is to simplify the code drastically as follows...

I’d suggest that users comment out lines 12336 to 12364 inclusive by inserting /* at the start of line 12336, and */ at the end of 12364. Then either before, or after this, insert the following:

    // Code replaced by D. Laurenson to process with docear4word
    //
    // In this code, no attempt is made to ensure that sentence beginnings
    // are made uppercase.  Without searching for all possible abbreviations
    // such a check would be only partial at best.  The faulty code
    // attempted to captialise all words following :, ? or !, but not .
    //
    // Start by splitting the string according to spaces only
    var words = str.split(/\s/);

    // For each word, check against the non-capitalise list, and if not in the list, then captialise it
    for (i=0,ilen=words.length;i<ilen;i+=1) {
        if (!words[i].match(state.locale[state.opt.lang].opts["skip-words-regexp"])) {

            // Exclude any words with digits, e.g. v1.0
            if (words[k].match(/[0-9]/)) {
                continue;
            }

            // Transform word only if all lowercase, to ignore unusual acronyms, e.g. aCGP
            lowerCaseVariant = words[k].toLowerCase();
            if (words[k] === lowerCaseVariant) {
                words[i] = capitalise(words[i]);
            }
        }
    }

    // Reconstruct the string with the words in title case
    doppel.string = words.join(" ");

As you can see, it is relatively straightforward code – nothing fancy, but that appears to be in its favour! I’ve also taken the liberty of putting in comments – something that is stripped out when citeproc.js is created.

This time I’ve checked it with a number of very specific examples to stretch its capabilities, using escaped characters, etc., to make sure that it works.

All the best,

Dave.

faph commented 9 years ago

It might be more effective if an issue is raised with siteproc.js here: https://bitbucket.org/fbennett/citeproc-js/issues

Docear4Word can then simply make a new release with an updated citeproc library.

DaveLaur commented 9 years ago

I have raised this with the author of citeproc (Frank Bennett). It turns out that this issue is the result of Javascript incompatibilities in the function used to split strings, and he is looking to put a fix in for this. I've edited my comment above to include the code that I have proposed as a simple alternative that will work for IE.

Docear4word doesn’t ship with the latest citeproc.js, and when I tried it, it didn’t seem to like it either, but I think that is a docear4word problem, so taking the most recent version of citeproc could be more complicated than a simple drop-in replacement.

Dave.

DmitriyC commented 8 years ago

As a potential workaround for this issue, the current beta version of JabRef (2.11b4) has support for converting entries to title-case capitalization (although the function can only be applied to one entry at a time, unfortunately). To avoid the citeproc.js bug, users can then simply remove title text-case formatting from the citation style.

Dave, thank you for the suggestions. There is a small copy-paste error from the original code (i vs. k), but it works well. I've extended your code to cover a few punctuation marks (hyphenation, slashes, etc.), but obviously there are still a lot of limitations. I can't attach files, but the code is located at http://pastie.org/10541668. My programming skill is still fairly basic and I'm entirely new to JavaScript, but the code seems to work fine. The goal was to minimize the amount of final editing the end user needs to make to their bibliography.