Open bwiernik opened 5 years ago
Sounds good to me. Any potential drawbacks when this is implemented?
These could be implemented in CSL-JSON as arrays for short
, original
, reviewed
. cf. https://github.com/citation-style-language/schema/issues/169
Hmm. @dstillman Looks Zotero devs are not big fans arrays and objects. Suggestions concerning the data structure here?
What about special affixes that can be used with any other variable? E.g. for -short
On the long run we were talking about a hierarchical data model. At least for reviewed-
and original-
that would probably the most flexible solution.
These could be implemented in CSL-JSON as arrays for short, original, reviewed.
I'm not clear what the suggestion is here. Can you give an example?
@dstillman I should have said objects, not arrays. Example:
"reviewed": {
"type": "motion_picture",
"medium": "DVD",
"title": "Title of reviewed movie"
}
vs listing these as individual reviewed-
variables.
reviewed-type: motion_picture
reviewed-medium: DVD
reviewed-title: Title of reviewed movie
So you're wanting to change a mostly flat (aside from contributors and dates) data model to a more structured one.
Ultimately, this could lead to a data model as outlined here.
I think this is more a discussion for the CSL list, but in general I would strongly advocate for key-value pairs over objects, except where the fields don't make sense independently and the app would need special handling of all associated variables for proper processing anyway. If it's something where there could be a direct mapping between a field and a variable, it's vastly simpler to stick to key-value pairs, and it also allows for hacks like Extra. Reducing implementation complexity is much more important in my view than reducing verbosity in CSL-JSON.
Yeah, it's easy to add these variable strings ("reviewed-ttile" and such), so let's just do that. We could have defined "container" as an object, for example, but we didn't.
I understand. But what about special handling if prefixes and affixes to variables? Is there a way to define affixes that could be used on other variables? Like allow -short as a general modifing suffix and reviewed- as general modifing prefix? Would that be somehow possible?
I think that would affect the processor more than the app. If we support a given -short
or reviewed-
field, the mapping would be hard-coded. It's the processor that would need to know how to handle those.
(I don't totally get it, though. Wouldn't there be nonsensical possibilities? What does issued-short
mean?)
Okay, so let's stick with key-value pairs.
Dan makes a good point on -short
. It should apply only to standard variables (string, number, title), not name or date variables.
So 2 questions:
Is it possible to have such prefix/suffix rules in JSON to prevent unnecessary verbosity? (and yes, we will need to restrict -short
to certain variables)
If yes, should we do this?
Or should we simply add possible variables to reviewed-
, original-
, container-
, collection-
?
The three relevant affixes are -short
, original-
and reviewed-
. Could we define valid combinations of these in the style and data schemas using string concatenation?
So, something like variables.short = variables.standard + '-short'
and variables.original = 'original-' + variables.all
?
The three relevant affixes are
-short
,original-
andreviewed-
.
Addendum: With container-
, collection-
I was not suggesting we should add that now. But perhaps in the medium run?
So, something like
variables.short = variables.standard + '-short'
andvariables.original = 'original-' + variables.all
?
That looks good. Would make schema updates easier, wouldn't it? (But I'm a bit pessimistic that will work so easily: https://stackoverflow.com/questions/9708192/use-a-concatenated-dynamic-string-as-javascript-object-key yes that's old, but I perhaps still relevant? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer says it's possible with recent JS, but not with JSON.)
Edit: looks like I misrepresent the problem here. You were not concatenating the key, so it might be easier. (But I don't know.)
I think container and collection are a much bigger can of worms than the others, so let's set those aside for now.
For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.
How would that look like and how would that solve the problem with schema verbosity? Would it?
Idea: if we have all variables available with original-
or reviewed-
, what about a mechanism like alternative
in csl-m where you can render all variables prefixed with alt-
with a single alternative
variable? Could make style coding easier.
In addition to or instead of making them available as regular variables? It would need to be in addition to if anything. Most styles only want a portion of such information or have different formatting requirements (e.g., APA wants original medium, original type, original title, and original author, not a full reference).
How would that look like and how would that solve the problem with schema verbosity? Would it?
A python script in GitHub Actions could compile the csl-data.json at commit time. It would have a list of all types and variables (separated by category) and dynamically construct the JSON. The main benefit would be ease of maintenance and updating, not needing to manually keep four nearly identical lists aligned manually.
For the data schema, an option might be to split out the schemas into separate files that match the RNC type and variable structure and use a build script to compile them at commit time.
Yes, I was wondering about something like this.
In addition to or instead of making them available as regular variables? It would need to be in addition to if anything.
Sure, in addition to the regular variables. For reviews you will most likely want a full reference, right? (And that reference should also be rendered according to the current style---so giving these details in the regular title is actually not ideal.)
Edit: well, at least Chicago does not request this.
So, what shall we do about this now? Should I draft a PR for original-
, reviewed-
, and -short
? Or should we go the automated route instead?
So, what shall we do about this now? Should I draft a PR for
original-
,reviewed-
, and-short
? Or should we go the automated route instead?
Depends who's going to write the python script and when.
I have basic python skills, but am not knowledge about parsing text as we need (see comment).
I have basic python skills, but am not knowledge about parsing text as we need
The question is: How will our input look like? Will we just use the json? Or could we even work with native python structures? If so, we don't have to parse anything.
I don't understand. I was assuming input is the rnc file(s), output is csl-data.json.
What were you thinking? A single, say python, file, whose contents is the data representation, output to both rnc and json?
A was thinking we could use a common source for both rnc and json.
variables = [
{
"name" : "title",
"type" : "string",
"variants" : ["original-", "reviewed-", "-short"]
},
{
"name" : "author",
"type" : "name",
"variants" : ["original-", "reviewed-", "container-]
},
]
def create_rnc(variables):
# this creates the rnc schema variable list
return rnc
def create_json(variables):
# this creates the json schema
return json
rnc= create_rnc(variables)
json = create_json(variables)
A single, say python, file, whose contents is the data representation, output to both rnc and json?
Exactly, see above. (Or instead of python, we could also use some other common source that is easy to write and parse, say yaml or toml.
IC.
I'm agnostic; whatever gets us to the best and easiest result, which is consistent schemas, and clean git histories, including diffs.
I'm not sure on the details of CI in GitHub; how it would work.
On your example, though, maybe better to have separate dicts for datatypes; like "variables-string."
You want to test a minimal toml or yaml approach in your test repo and report?
;-)
PS - let's move this conversation to the linked issue?
Maybe we're making things too complicated regarding the json here?
Can't we just add a note to the specs that would basically say, "Any non-number variable can also be supplied in the short form with the suffix -short. For every variables exist variants prefixed with original- and reviewed-". I mean, the json is not used for validation, right? Do we need to have those in the json schema at all?
(Found this idea somewhere in the Zotero forums discussion linked above: https://forums.zotero.org/discussion/75366/accommodating-both-full-series-names-and-series-abbreviations)
Regarding -short: There might be some variables we will want to exclude from that list, but I'm not sure it will hurt allowing them anyway.
And regarding the rnc, is there any way to use patterns without using a script here? @bdarcus
I think you can use patterns to validate, but then you lose auto-completion in validating editors.
I think you can use patterns to validate, but then you lose auto-completion in validating editors. …
Ok, so we should have all usable variables in the rnc schema. But what about the json? Would we get away with a note in the specs?
I don't really see a problem with listing the variables out in both the rnc and the json schemas if we have the scripts that can generate them automatically when a new variable is added. Put all of the "reviewed", "short", and "original" values together and label as such.
Ok. I can continue working on the script tomorrow, but perhaps just add the variables manually so we can close thus one. The question is just: -short on everything except? Numbers, names? What about DOI etc? Or should we be liberal here and really allow everything with short, whether it makes sense or not?
And reviewed and original on everything, right?
No, put them where they obviously belong.
We can always add later.
Variable lists are already getting pretty long.
Definitely not any names or dates. Other places where they make sense.
Reviewed and original should make sense everywhere right?
Yes.
Ok. So it's basically short forms for all titles and many strings. Then everything with reviewed and original variants. will do tomorrow.
Reading through this thread, I'm still not clear on the meaning of -short
, original-
, and reviewed-
? Short makes the most sense to me... styles sometimes want to use a shortened title or container.
But what are original and reviewed? Does this just mean a curator has reviewed the CSL JSON and changed "original" to "reviewed". If this is the case, I don't see a need for these fields to be part of the spec. Users could still include them in CSL JSON. Are styles ever going to render "original" or "reviewed" fields as part of a reference list?
I think they address cases like these @dhimmel; denis and brenton can correct me if I'm off:
review of XYZ
).@dhimmel Yes, for example, APA has different formats for reviews of books versus films versus articles. So, at minimum APA would require reviewed-director
, reviewed-editor
, and one of reviewed-type
/reviewed-genre
/reviewed-medium
. Other styles additionally require reviewed-publisher
and reviewed-issued
.
For original-
, there are already original-
fields for books, but not sufficient fields for other types. For example, if an article is reprinted from another source, APA style wants original-title
and original-date
(both already exist), but also original-container-title
, original-volume
, original-issue
, and original-pages
.
At this point, the list of reviewed-
and original-
variables becomes long enough that we may as well just say "any variable can be supplied with original-
or reviewed-
prefixes to refer to the original version of the item or the item being reviewed by the current item.
We might actually also need original type and reviewed type right?
I think we can avoid it for now--reviewed-genre
should probably be sufficient.
Thanks @bwiernik for the explanation of reviewed-
and original-
. It seems that they actually both refer to a different work that could have its own CSL Data Item.
It seems to me like the best data model for CSL JSON here would therefore allow the following keys:
reviews:
CSL_Item
original:
CSL_Item
translation:
CSL_Item
language: en-US
CSL_Item is a variable here... so it becomes a bit recursive, but gets rid of the repetition.
Sounds like there's a desire to keep the CSL JSON spec as flat as possible. But other fields like date-parts already violate this design. @bwiernik, what are the downsides to the model where fields like reviews
and original
point to CSL Item objects themselves?
To be clear, that's something I'd be very much in favour of! However, this is a very major change. Don't know popular the idea is here? Also, I have the impression that some implementers have concerns about this.
Perhaps @jgm @dstillman @fbennett have some input for us here...
Recently, in response to user needs for the SBL style, citeproc-js added support for providing
-short
versions of all CSL variables, to be rendered withform="short"
. (https://forums.zotero.org/discussion/comment/324592/#Comment_324592)In my work on apa.csl, I'm finding that it wants a lot more detailed information for reviews (e.g., medium of item being reviewed, date of item being reviewed), as well as for original publication information (e.g., original medium, original container title, original pages, original editor) than is currently possible with existing CSL variables. As far as I am aware, MLA and Chicago have similar requirements.
I suggest that
-short
,original-
,reviewed-
should be expanded so that they can be applied to any CSL variable. This would allow maximum flexibility without having to individually specify each possible variable of this kind.