Add "triage" subdir, change status of a few tests

bdarcus commented 1 year ago

@jgm earlier posted this over at the Discourse, noting some tests he found problematic.

https://discourse.citationstyles.org/t/upcoming-csl-meetup-context/1767/12

Any suggestions on what we do with them?

The questions and commentary below are John's.

Here are some examples of failures from citeproc-hs. In each case I may have simply missed a relative part of the spec.

Does the spec specify these behaviors for quotes?

[FAILED]   test/csl/flipflop_OrphanQuote.txt
--- expected
+++ actual
@@
-Nation of "Positive Obligations " of State under the European Convention on Human Rights (1)
+Nation of “Positive Obligations ” of State Under the European Convention on Human Rights (1)

[FAILED]   test/csl/quotes_QuotesUnderQuotesFalse.txt
--- expected
+++ actual
@@
 <div class="csl-bib-body">
-  <div class="csl-entry"> 'Title with ‘quotes’ in it',.</div>
+  <div class="csl-entry"> ’Title with ‘quotes’ in it’,.</div>
 </div>

Does the spec say to do this? (As noted in my last mail, it’s not always desirable.)

[FAILED]   test/csl/magic_CapitalizeFirstOccurringTerm.txt
--- expected
+++ actual
@@
-And
+and

Does the spec specify these things?

[FAILED]   test/csl/sort_OmittedBibRefMixedNumericStyle.txt
--- expected
+++ actual
@@
   <div class="csl-entry">1. Anderson, Book One</div>
-  <div class="csl-entry">2. [CSL STYLE ERROR: reference with no printed form.]</div>
+  <div class="csl-entry">[CSL STYLE ERROR: reference with no printed form.]</div>
   <div class="csl-entry">3. Crane, Book Two</div>

[FAILED]   test/csl/sort_OmittedBibRefNonNumericStyle.txt
--- expected
+++ actual
@@
   <div class="csl-entry">Anderson, Book One</div>
+  <div class="csl-entry">[CSL STYLE ERROR: reference with no printed form.]</div>
   <div class="csl-entry">Crane, Book Two</div>

Here I wonder if the absence of space between the Latin and Chinese names is really intended?

[FAILED]   test/csl/name_EtAlWithCombined.txt
--- expected
+++ actual
@@
   <div class="csl-entry">John Doe。</div>
-  <div class="csl-entry">——著，Ziggy Zither等點校。</div>
+  <div class="csl-entry">——著，Ziggy Zither 等點校。</div>
[Edit: added later] In this case I don’t understand why the space before Frinkle (which is present in the reference database) disappears:

[FAILED]   test/csl/sort_LeadingApostropheOnNameParticle.txt
--- expected
+++ actual
@@
   <div class="csl-entry">d’Wander, W</div>
-  <div class="csl-entry">de’ Frinkle, B</div>
+  <div class="csl-entry">de’Frinkle, B</div>
   <div class="csl-entry">in ’t Horvath, P A B</div>

zepinglee commented 1 year ago

Here I wonder if the absence of space between the Latin and Chinese names is really intended?


[FAILED]   test/csl/name_EtAlWithCombined.txt
--- expected
+++ actual
@@
   <div class="csl-entry">John Doe。</div>
-  <div class="csl-entry">——著，Ziggy Zither等點校。</div>
+  <div class="csl-entry">——著，Ziggy Zither 等點校。</div>

There is no universal standard about the space character between Han and western characters but omitting it is one of the accepting styles.

In typesetting, there is usually a 1/6 to 1/4 em space (width) between Han and western glyphs. Many softwares like Adobe Indesign, Microsoft Office, and TeX with appropriate packages can handle this automatically. However people do not form a consistent style of mixed Han and western in the input text characters. As far as I know, companies like Apple and Google tend to insert these space character in Chinese but omit them in Japanese (compare https://www.apple.com.cn/ and https://www.apple.com/jp/).

bdarcus commented 1 year ago

So what do you suggest on that one, @zepinglee? Leave as is?

Aside: the others ones are all also, or maybe more, about the spec.

zepinglee commented 1 year ago

So what do you suggest on that one, @zepinglee? Leave as is?

Aside: the others ones are all also, or maybe more, about the spec.

Yes. If I remember correctly, the output of citeproc-js is in this compact style.

jgm commented 1 year ago

Wouldn't it be sensible to have a set of conformance tests that only test behavior that is required by the spec? (And not also behavior that is not spec but happens to have been implemented by citeproc.js?) What's not controversial is that this issue is not in the spec.

bdarcus commented 1 year ago

Right, so really this is more a spec issue that anything? E.g. I should move this issue there?

bdarcus commented 1 year ago

To update after the issue transfer, the above tests are sensible probably, but the behavior not mentioned in the spec.

So the spec should be amended accordingly.

And/or, tests removed from the test suite.

jgm commented 1 year ago

If there's not a standard way of doing this, then I'd suggest leaving it out of the spec (and therefore the tests).

bdarcus commented 1 year ago

So reading between the lines, are you recommending dropping all four from the tests, and be done?

jgm commented 1 year ago

Hard to say for sure without knowing more about what these tests are intended to test for. But the general principle I'd favor would be not to test things that aren't part of the spec -- or to separate out these tests into a separate test suite for citeproc.js.

adam3smith commented 1 year ago

I think that's a good idea. @Jason-Abbott also ran into a bunch of these for his swift citeproc implementation (see recent posts over on discourse).

There are 3-4 different categories of tests, though, that we need to treat differently

Tests that address spec compliance. Nothing to do
Tests that don't explicitly target the spec, but are necessary to reliably produce expected outcome, e.g. avoiding double spaces or periods. Short term, we could tag/file these as required or so. Long-term, they should become part of the spec, possibly in a technical appendix so that the spec remains legible for style authors
Tests that we think are incorrect, i.e., contradict the spec or produce undesirable outcome not required by the spec. Short term, we could file these as deprecated. Long term, we should look at these and modify them towards desired behavior
(Not sure about this one) Tests that aren't supported by the spec and don't fall under category 2, but do produce reasonable output. The name_EtAlWithCombined.txt would seem to fall in that category as do, possibly, some of the others above. I'd tag/file those as "recommended" perhaps? Following these isn't necessary to produce correct citation output, but they do raise relevant considerations and provide a reasonable default.

bdarcus commented 1 year ago

I did add ability to include metadata in the tests, which could be useful in categorizing and explaining.

https://github.com/citation-style-language/test-suite/issues/51

bdarcus commented 1 year ago

What if we just have a "triage" subdir of the test-suite, and so mark tests like these in need of some resolution, and move these tests there? Can also tag them.

adam3smith commented 1 year ago

What if we just have a "triage" subdir of the test-suite, and so mark tests like these in need of some resolution, and move these tests there? Can also tag them.

I'm fine with that -- always in favor of immediate improvements -- but I do think we should have a plan for how the triage categories look. If my above suggestions are fine, we can go with those, but obviously I've never written a processor, so make no claims that these are right.

@fbennett does that sound OK to you?

Jason-Abbott commented 1 year ago

A triage folder or other indicator seems sensible to me as well. At the moment, the Swift CSL implementation I’ve been working on is set up to chug through every test in the suite, which may give me reason for some PRs to add whatever indicator is chosen here.

(My test progress slowed a bit as I had some day-job travel and refactoring but about to re-engage.)

bdarcus commented 1 year ago

I'll plan to follow my suggestion, unless I hear otherwise. It's easy enough to adjust these either way though.

Is your implementation open source @Jason-Abbott?

Jason-Abbott commented 1 year ago

Is your implementation open source @Jason-Abbott?

It isn’t currently. I will open-source some or all of it as I finalize some product ideas.

Jason-Abbott commented 1 year ago

@adam3smith , @bdarcus following up here after my learnings on citation-number — I would guess those tests that depend on unique and undocumented behaviors of <key variable="citation-number"/> belong in this triage or similar sub-directory.

I would even argue that these behaviors should never be added to the specification since, although I understand the need for and relative simplicity of the original code change, it breaks the meaning of sort keys. So perhaps an off-spec or citeproc-js sub-directory …

No strong feeling from me. I’d only like to assist future processor writers or debuggers 🙂

bdarcus commented 1 year ago

My POV is anything that isn't in the spec doesn't belong in the test suite, and if there's any uncertainty about what to do with such a test, it should go in "triage."

I like your idea of an "off-spec" subdir for ones we agree shouldn't be in the spec (and your explanation makes sense to me on these). Perhaps we could just add that, and when moving tests there, include an explanation of why?

Care to PR that too?

Assuming @adam3smith agrees ...

Jason-Abbott commented 1 year ago

Yeah, I’d be happy to do a PR. I’ll need to review those citation-number tests again since a few don’t require the off-spec behavior … so, some time this week.

zepinglee commented 1 year ago

I would even argue that these behaviors should never be added to the specification since, although I understand the need for and relative simplicity of the original code change, it breaks the meaning of sort keys. So perhaps an off-spec or citeproc-js sub-directory …

I agree. Those sort_ tests are confusing to me and I skipped them in CI.

The citepoc-js has a local directory (https://github.com/Juris-M/citeproc-js/tree/master/fixtures/local) of tests specific to itself. Perhaps we can move them there?

adam3smith commented 1 year ago

I'm still struggling with how to think about things were the test behavior may be desirable but probably shouldn't be spec'd. To take this test/behavior as an example: Frank added the behavior to citeproc-js (and then added a test) based on a user request and to allow for something that would otherwise not be possible, re-interpreting a syntax that is valid according to the specs (sorting by citation-number descending) but (literally) paradoxical. As I mention to Jason above -- a processor would almost certainly be fine without meeting this test. I don't necessarily want to burden the spec with micro-managing behavior like this. At the same time, it seems wasteful to throw out the collected experience represented by tests in this category. That's why I'm suggesting a recommended category above. Not sure if that's the right way to go about this, but I also worry that both alternatives I see -- writing every edge case into the specs in great detail or completely deprecating tests of this nature -- have significant downsides. I understand Jason as expressing similar ambivalence.

To move forward without getting bogged down on this, I think we either

can table the idea of keeping some off-spec tests as recommended, which would mean just creating a single off-spec directory now and then at a later point revisit a) the idea of recommended, off-spec tests and b) the status of individual off-spec tests or
agree on the idea of recommended off-spec tests now and create separate subdirectories

I guess wrt moving them to citeproc-js local, I'm not sure what that implies in terms of advice to citeproc developers?

Jason-Abbott commented 1 year ago

I hadn't noticed, @zepinglee, that citeproc-js has its own local tests built in this same way. I see sections I don't recognize in some of the tests there so apparently the test harness is also extended there.

Even so, I tend to agree with @adam3smith insofar as the impact of moving tests from here to there is unclear, an intermediate step of moving them into a subfolder here is safer, until that can be better evaluated.

Personally, my goal is to deliver a processor that behaves as Zotero and other users expect, which could well mean it incorporates citeproc-js idiosyncrasies. That's why I've so far adapted my code to all the tests here.

Seeing all those other local citeproc-js tests adds a wrinkle I'll need to think about. I'm fairly new to this CSL space. I imagine there's a conceptual distinction between off-spec tests here and local tests in citeproc-js, though I sure can't say what it is yet.

fbennett commented 1 year ago

If it's at all helpful, documentation on CSL-M idiosyncracies is linked from the Jurism website.

https://juris-m.github.io/cslm-docs/

Frank

On Tue, Jun 13, 2023, 13:14 Jason Abbott @.***> wrote:

I hadn't noticed, @zepinglee https://github.com/zepinglee, that citeproc-js has its own local tests built in this same way. I see sections I don't recognize in some of the tests there so apparently the test harness is also extended there.

Even so, I tend to agree with @adam3smith https://github.com/adam3smith insofar as the impact of moving tests from here to there is unclear, an intermediate step of moving them into a subfolder here is safer, until that can be better evaluated.

Personally, my goal is to deliver a processor that behaves as Zotero and other users expect, which could well mean it incorporates citeproc-js idiosyncrasies. That's why I've so far adapted my code to all the tests here.

Seeing all those other local citeproc-js tests adds a wrinkle I'll need to think about. I'm fairly new to this CSL space. I imagine there's a conceptual distinction between off-spec tests here and local tests in citeproc-js, though I sure can't say what it is yet.

— Reply to this email directly, view it on GitHub https://github.com/citation-style-language/test-suite/issues/53#issuecomment-1588498999, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASMSWNAI3SJ7OA5XE62ELXK7SL3ANCNFSM6AAAAAAWY4WY2M . You are receiving this because you were mentioned.Message ID: @.***>

citation-style-language / test-suite

Add "triage" subdir, change status of a few tests #53