citation-style-language / schema

Citation Style Language schema
https://citationstyles.org/
MIT License
185 stars 61 forks source link

Expanded -short, original-, reviewed- variables #167

Open bwiernik opened 5 years ago

bwiernik commented 5 years ago

Recently, in response to user needs for the SBL style, citeproc-js added support for providing -short versions of all CSL variables, to be rendered with form="short". (https://forums.zotero.org/discussion/comment/324592/#Comment_324592)

In my work on apa.csl, I'm finding that it wants a lot more detailed information for reviews (e.g., medium of item being reviewed, date of item being reviewed), as well as for original publication information (e.g., original medium, original container title, original pages, original editor) than is currently possible with existing CSL variables. As far as I am aware, MLA and Chicago have similar requirements.

I suggest that -short, original-, reviewed- should be expanded so that they can be applied to any CSL variable. This would allow maximum flexibility without having to individually specify each possible variable of this kind.

denismaier commented 4 years ago

Sounds good to me. Any potential drawbacks when this is implemented?

bwiernik commented 4 years ago

These could be implemented in CSL-JSON as arrays for short, original, reviewed. cf. https://github.com/citation-style-language/schema/issues/169

denismaier commented 4 years ago

Hmm. @dstillman Looks Zotero devs are not big fans arrays and objects. Suggestions concerning the data structure here?

What about special affixes that can be used with any other variable? E.g. for -short

On the long run we were talking about a hierarchical data model. At least for reviewed- and original- that would probably the most flexible solution.

dstillman commented 4 years ago

These could be implemented in CSL-JSON as arrays for short, original, reviewed.

I'm not clear what the suggestion is here. Can you give an example?

bwiernik commented 4 years ago

@dstillman I should have said objects, not arrays. Example:

"reviewed": {
  "type": "motion_picture",
  "medium": "DVD",
  "title": "Title of reviewed movie"
}

vs listing these as individual reviewed- variables.

reviewed-type: motion_picture
reviewed-medium: DVD
reviewed-title: Title of reviewed movie
bdarcus commented 4 years ago

So you're wanting to change a mostly flat (aside from contributors and dates) data model to a more structured one.

denismaier commented 4 years ago

Ultimately, this could lead to a data model as outlined here.

dstillman commented 4 years ago

I think this is more a discussion for the CSL list, but in general I would strongly advocate for key-value pairs over objects, except where the fields don't make sense independently and the app would need special handling of all associated variables for proper processing anyway. If it's something where there could be a direct mapping between a field and a variable, it's vastly simpler to stick to key-value pairs, and it also allows for hacks like Extra. Reducing implementation complexity is much more important in my view than reducing verbosity in CSL-JSON.

bdarcus commented 4 years ago

Yeah, it's easy to add these variable strings ("reviewed-ttile" and such), so let's just do that. We could have defined "container" as an object, for example, but we didn't.

denismaier commented 4 years ago

I understand. But what about special handling if prefixes and affixes to variables? Is there a way to define affixes that could be used on other variables? Like allow -short as a general modifing suffix and reviewed- as general modifing prefix? Would that be somehow possible?

dstillman commented 4 years ago

I think that would affect the processor more than the app. If we support a given -short or reviewed- field, the mapping would be hard-coded. It's the processor that would need to know how to handle those.

(I don't totally get it, though. Wouldn't there be nonsensical possibilities? What does issued-short mean?)

bwiernik commented 4 years ago

Okay, so let's stick with key-value pairs.

Dan makes a good point on -short. It should apply only to standard variables (string, number, title), not name or date variables.

denismaier commented 4 years ago

So 2 questions:

  1. Is it possible to have such prefix/suffix rules in JSON to prevent unnecessary verbosity? (and yes, we will need to restrict -short to certain variables)

  2. If yes, should we do this?

Or should we simply add possible variables to reviewed-, original-, container-, collection- ?

bwiernik commented 4 years ago

The three relevant affixes are -short, original- and reviewed-. Could we define valid combinations of these in the style and data schemas using string concatenation?

So, something like variables.short = variables.standard + '-short' and variables.original = 'original-' + variables.all?

denismaier commented 4 years ago

The three relevant affixes are -short, original- and reviewed-.

Addendum: With container-, collection- I was not suggesting we should add that now. But perhaps in the medium run?

So, something like variables.short = variables.standard + '-short' and variables.original = 'original-' + variables.all?

That looks good. Would make schema updates easier, wouldn't it? (But I'm a bit pessimistic that will work so easily: https://stackoverflow.com/questions/9708192/use-a-concatenated-dynamic-string-as-javascript-object-key yes that's old, but I perhaps still relevant? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer says it's possible with recent JS, but not with JSON.)

Edit: looks like I misrepresent the problem here. You were not concatenating the key, so it might be easier. (But I don't know.)

bwiernik commented 4 years ago

I think container and collection are a much bigger can of worms than the others, so let's set those aside for now.

bwiernik commented 4 years ago

For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.

denismaier commented 4 years ago

How would that look like and how would that solve the problem with schema verbosity? Would it?

denismaier commented 4 years ago

Idea: if we have all variables available with original- or reviewed-, what about a mechanism like alternative in csl-m where you can render all variables prefixed with alt- with a single alternative variable? Could make style coding easier.

https://citeproc-js.readthedocs.io/en/latest/csl-m/#id15

bwiernik commented 4 years ago

In addition to or instead of making them available as regular variables? It would need to be in addition to if anything. Most styles only want a portion of such information or have different formatting requirements (e.g., APA wants original medium, original type, original title, and original author, not a full reference).

bwiernik commented 4 years ago

How would that look like and how would that solve the problem with schema verbosity? Would it?

A python script in GitHub Actions could compile the csl-data.json at commit time. It would have a list of all types and variables (separated by category) and dynamically construct the JSON. The main benefit would be ease of maintenance and updating, not needing to manually keep four nearly identical lists aligned manually.

bdarcus commented 4 years ago

For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.

Yes, I was wondering about something like this.

denismaier commented 4 years ago

In addition to or instead of making them available as regular variables? It would need to be in addition to if anything.

Sure, in addition to the regular variables. For reviews you will most likely want a full reference, right? (And that reference should also be rendered according to the current style---so giving these details in the regular title is actually not ideal.)

Edit: well, at least Chicago does not request this.

denismaier commented 4 years ago

So, what shall we do about this now? Should I draft a PR for original-, reviewed-, and -short? Or should we go the automated route instead?

bdarcus commented 4 years ago

So, what shall we do about this now? Should I draft a PR for original-, reviewed-, and -short? Or should we go the automated route instead?

Depends who's going to write the python script and when.

I have basic python skills, but am not knowledge about parsing text as we need (see comment).

denismaier commented 4 years ago

I have basic python skills, but am not knowledge about parsing text as we need

The question is: How will our input look like? Will we just use the json? Or could we even work with native python structures? If so, we don't have to parse anything.

bdarcus commented 4 years ago

I don't understand. I was assuming input is the rnc file(s), output is csl-data.json.

What were you thinking? A single, say python, file, whose contents is the data representation, output to both rnc and json?

denismaier commented 4 years ago

A was thinking we could use a common source for both rnc and json.


variables = [
    {
        "name" : "title",
        "type" : "string",
        "variants" : ["original-", "reviewed-", "-short"]
    },
    {
        "name" : "author",
        "type" : "name",
        "variants" : ["original-", "reviewed-", "container-]
    },
    ]

def create_rnc(variables):
    # this creates the rnc schema variable list
    return rnc

def create_json(variables):
    # this creates the json schema
    return json

rnc= create_rnc(variables)
json = create_json(variables)
denismaier commented 4 years ago

A single, say python, file, whose contents is the data representation, output to both rnc and json?

Exactly, see above. (Or instead of python, we could also use some other common source that is easy to write and parse, say yaml or toml.

bdarcus commented 4 years ago

IC.

I'm agnostic; whatever gets us to the best and easiest result, which is consistent schemas, and clean git histories, including diffs.

I'm not sure on the details of CI in GitHub; how it would work.

On your example, though, maybe better to have separate dicts for datatypes; like "variables-string."

bdarcus commented 4 years ago

You want to test a minimal toml or yaml approach in your test repo and report?

;-)

bdarcus commented 4 years ago

PS - let's move this conversation to the linked issue?

denismaier commented 4 years ago

Maybe we're making things too complicated regarding the json here?

Can't we just add a note to the specs that would basically say, "Any non-number variable can also be supplied in the short form with the suffix -short. For every variables exist variants prefixed with original- and reviewed-". I mean, the json is not used for validation, right? Do we need to have those in the json schema at all?

(Found this idea somewhere in the Zotero forums discussion linked above: https://forums.zotero.org/discussion/75366/accommodating-both-full-series-names-and-series-abbreviations)

Regarding -short: There might be some variables we will want to exclude from that list, but I'm not sure it will hurt allowing them anyway.

denismaier commented 4 years ago

And regarding the rnc, is there any way to use patterns without using a script here? @bdarcus

bdarcus commented 4 years ago

I think you can use patterns to validate, but then you lose auto-completion in validating editors.

denismaier commented 4 years ago

I think you can use patterns to validate, but then you lose auto-completion in validating editors.

Ok, so we should have all usable variables in the rnc schema. But what about the json? Would we get away with a note in the specs?

bwiernik commented 4 years ago

I don't really see a problem with listing the variables out in both the rnc and the json schemas if we have the scripts that can generate them automatically when a new variable is added. Put all of the "reviewed", "short", and "original" values together and label as such.

denismaier commented 4 years ago

Ok. I can continue working on the script tomorrow, but perhaps just add the variables manually so we can close thus one. The question is just: -short on everything except? Numbers, names? What about DOI etc? Or should we be liberal here and really allow everything with short, whether it makes sense or not?

And reviewed and original on everything, right?

bdarcus commented 4 years ago

No, put them where they obviously belong.

We can always add later.

Variable lists are already getting pretty long.

bwiernik commented 4 years ago

Definitely not any names or dates. Other places where they make sense.

denismaier commented 4 years ago

Reviewed and original should make sense everywhere right?

bwiernik commented 4 years ago

Yes.

denismaier commented 4 years ago

Ok. So it's basically short forms for all titles and many strings. Then everything with reviewed and original variants. will do tomorrow.

dhimmel commented 4 years ago

Reading through this thread, I'm still not clear on the meaning of -short, original-, and reviewed-? Short makes the most sense to me... styles sometimes want to use a shortened title or container.

But what are original and reviewed? Does this just mean a curator has reviewed the CSL JSON and changed "original" to "reviewed". If this is the case, I don't see a need for these fields to be part of the spec. Users could still include them in CSL JSON. Are styles ever going to render "original" or "reviewed" fields as part of a reference list?

bdarcus commented 4 years ago

I think they address cases like these @dhimmel; denis and brenton can correct me if I'm off:

  1. a translated title, where you also need the original (untranslated) title
  2. a review article, which has its own title, but where one also needs to include the title the article is reviewing (think review of XYZ).
bwiernik commented 4 years ago

@dhimmel Yes, for example, APA has different formats for reviews of books versus films versus articles. So, at minimum APA would require reviewed-director, reviewed-editor, and one of reviewed-type/reviewed-genre/reviewed-medium. Other styles additionally require reviewed-publisher and reviewed-issued.

For original-, there are already original- fields for books, but not sufficient fields for other types. For example, if an article is reprinted from another source, APA style wants original-title and original-date (both already exist), but also original-container-title, original-volume, original-issue, and original-pages.

At this point, the list of reviewed- and original- variables becomes long enough that we may as well just say "any variable can be supplied with original- or reviewed- prefixes to refer to the original version of the item or the item being reviewed by the current item.

denismaier commented 4 years ago

We might actually also need original type and reviewed type right?

bwiernik commented 4 years ago

I think we can avoid it for now--reviewed-genre should probably be sufficient.

dhimmel commented 4 years ago

Thanks @bwiernik for the explanation of reviewed- and original-. It seems that they actually both refer to a different work that could have its own CSL Data Item.

It seems to me like the best data model for CSL JSON here would therefore allow the following keys:

reviews:
  CSL_Item
original:
  CSL_Item
translation:
  CSL_Item
  language: en-US

CSL_Item is a variable here... so it becomes a bit recursive, but gets rid of the repetition.

Sounds like there's a desire to keep the CSL JSON spec as flat as possible. But other fields like date-parts already violate this design. @bwiernik, what are the downsides to the model where fields like reviews and original point to CSL Item objects themselves?

denismaier commented 4 years ago

To be clear, that's something I'd be very much in favour of! However, this is a very major change. Don't know popular the idea is here? Also, I have the impression that some implementers have concerns about this.

Perhaps @jgm @dstillman @fbennett have some input for us here...