Closed hsluytergaethje closed 1 year ago
@hsluytergaethje thanks for this impressive list of oddities. It will be a great help in improving the overall consistency of the DraCor API. I created a few tickets to take care of the issues raised. In some cases, however, what seems like an inconsistency actually has some motivation behind it. Please, see the comments below.
Redundancies
- yearPrinted == printYear
- yearWritten == writtenYear
- yearPremiered == premiereYear --> In
play/{playname}
: yearWritten, yearPremiered, yearPrinted, yearNormalized
The *Year
properties are there for backwards compatibility and should be removed at some point: there is now #188.
- all information in 'author' are also in 'authors' - no deprecation warning as for
/play/{playname}
I created #187 to take care of it.
Capitalization
- 'wikidataID' but 'fullname' and 'shortname' (in 'authors')
While it's true that this naming does not look overly consistent, I would suggest not to change these in order to reduce the introduction of breaking changes. In the case of fullname
and shortname
I would argue that these are perfectly legible (in contrast, for instance, to wikidataid
) and can in fact sometimes be found spelled as one word (very much like "filename"). So the introduction of an inner capital would not improve things all that much. I'm aware that this is also a matter of personal taste, so if there is a strong urge for renaming these properties feel free to open an issue.
Needs clarification
- difference between 'name' and 'fullname' is only the format - not obvious
It is not so much a matter of format rather than function. The name
property can be used for sorting by author alphabetically. For instance in HunDraCor in most if not all cases name
and fullname
would be the same. I think we refrained from renaming the name
to something like sortname
when we revised the authors
property for backwards compatibility reasons. I suggest to stick to this decision but agree that it should be explained wherever we are going to document the JSON output of the API in the future.
API call
/corpora/{corpusname}/metadata
Redundancies
- playName == name
These are indeed redundant. I don't remember why both have been introduced in 30bf0314bc609095dd701ceb39491b6982022e3f, but playName
does not seem to be used anywhere else in dracor-api nor in dracor-frontend and could be removed, I guess.
Abbreviations
Publisher vs. Pub (Publication)?
- originalSourcePublisher BUT originalSourcePubPlace
Sometimes I think brevity trumps verbosity, and I find originalSourcePubPlace
easier to read and mentally parse than originalSourcePublicationPlace
. That's personal taste though and I wouldn't object to changing it considering that it has been introduced only recently and breakage could be limited.
Speaker vs. Sp
- numOfSpeakers BUT wordCountSp
"Speakers" here actually refers to entries in the particDesc
which could be either person
or personGrp
(which btw should be explained somewhere too). "Sp" on the other hand refers to the actual sp
element. So I wouldn't consider this an inconsistency rather than two different things named accordingly.
Acts vs. L (ines) and P (aragraphs):
- numOfActs BUT numOfP and numOfL
Same here: acts are encoded as div
elements with a certain type attribute while p
and l
are the actual TEI elements numOfP
and numOfL
refer to. This difference may not be obvious to the casual API user, but using numOfLines
or numOfParagraphs
would be even more ambiguous.
num (in networkdata) vs. number (in metadata)
- numPersonGroups BUT originalSourceNumberOfPages
Here I don't see any reason for the inconsistency other than we didn't think about it. So I guess we should change it.
average vs. max
- averageDegree BUT maxdegree
In my opinion max
and min
are so widely used as abbreviations throughout different programming languages that I would find it almost irritating to see maximum
or minimum
used instead. The same is not true for avrg
though.
Patterns
numOf vs. num
- numEdges
- numConnectedComponents
- numOfPersonGroups
- numOfSpeakers
Are there any suggestions into which direction we should align this @lehkost or @peertrilcke? I don't have a strong preference but it seems that the numOf* form is already used more often, so maybe that should be the one.
adjective or noun first
- normalizedGenre vs. yearNormalized
For me the difference in name construction has a subtle meaning: there is only one genre per play, it just happens to be normalised (what that means needs explanation though). But there are different kinds of years which is what I usually assume when the noun appears before some kind of adjective.
API call
/corpora/{corpusname}/metadata/csv
Difference to JSON
- numPersonGroups (csv) BUT numOfPersonGroups (json)
This is actually a bug. I would expect the column numPersonGroups
in the CSV to never show any values. It should be an easy fix.
Otherwise in csv output same problems as for names in JSON
API call
/corpora/{corpusname}/play/{playname}
Difference to corpus metadata
- genre BUT normalizedGenre
normalizedGenre
has been introduced in the context of #130. Apparently we missed to rename other occurrences of the property. There is now #189.
@hsluytergaethje @cmil: Thanks so-so much for taking the time to document and comment this. Like @cmil, I have no strong preferences for naming schemes in all the cases mentioned, it should just be consistent.
I created an API consolidation project to keep track of related issues.
All the follow-up issues to this one have been resolved. Feel free to open new issues where inconsistencies continue to be obtrusive.
The names of columns or dictionary keys are sometimes inconsistent or redundant. Below I listed these cases for the output format of the different API calls:
API call /corpora/{corpusname}
In 'dramas':
Redundancies
play/{playname}
: yearWritten, yearPremiered, yearPrinted, yearNormalized/play/{playname}
Capitalization
Needs clarification
API call
/corpora/{corpusname}/metadata
Redundancies
Abbreviations
Patterns
API call
/corpora/{corpusname}/metadata/csv
Difference to JSON
API call
/corpora/{corpusname}/play/{playname}
Difference to corpus metadata