Closed rsjoyner closed 1 year ago
The author_list attribute contains a semi-colon-separated list of names of people to be cited as authors of the associated product. The general format for individual names is: SURNAME, GIVEN NAME(s). Initials may be used in lieu of given name(s). If the name contains a suffix ("Jr.", "Sr.", "III", etc.) it should be placed before the comma (,). Do not include the word "and" before the final author. All authors should be listed explicitly - do not elide the list using "et al.".
Current information model doesn't appear to allow for the possibility of organisation/mononym authors. @jordanpadams, that seems like an oversight worth fixing.
Provided value smith, john; jones, tom, NASA; Google, Inc.
does not fail, and produces expected results.
Value smith, john; jones, tom; NASA; Google, Inc.
does fail due to NASA
mononym.
Will fix, such that mononym values are written to the author last-name field, with a blank first-name field (if that doesn't cause validation problems - need to check).
Suggest that authors like Google, Inc.
should be given like Google Inc.
, without any commas
~@jordanpadams is there a good reason the name parsing logic considers .
to be a valid separator? Does this support a known use-case?~
Ah, I see now - names like R. Deen
I've done the best I can untangling the name parsing logic, but to go any further with it I'll need a comprehensive list of name strings which the parser is expected to support. My tests support:
"Dunn, Alex",
"NASA",
"SomeCorp Inc.",
"Some, MiddleNamed, Gal",
"Suffixed Jr., James",
"R. Deen",
but the first/middle-name ordering is broken for (for example)
"J. R. Bader"
because detection of first/middle-name ordering isn't well-defined, currently.
@jordanpadams existing tests suggest a need to support format A.Dunn
. Is this correct? Seems like we should be able to expect people to input valid values, which I'd argue that isn't, but maybe there's something preventing us from being that opinionated?
@jordanpadams I found these cases
Examples of cases:
Case 1 --> Should be parsed by semi-colon
pds4_fields_authors = "Lemmon, M."
Case 2 --> Should be parsed by comma
pds4_fields_authors = "R. Deen, H. Abarca, P. Zamani, J.Maki"
Case 3 --> Should be parsed by semi-colon
pds4_fields_authors = "Davies, A.; Veeder, G."
Case 4 --> Should be parsed by semi-colon
pds4_fields_authors = "VanBommel, S. J., Guinness, E., Stein, T., and the MER Science Team"
Case 5 --> Should be parsed by semi-colon
pds4_fields_authors = "MER Science Team"
Most of the work is done, but issue is blocked pending confirmation of exactly which cases must be supported.
Final list of supported formats:
[
"A. Dunn",
"Dunn, Alex",
"Dunn, A.",
"Dunn, A. E.",
"Dunn, A. E. F. G.",
"Dunn, Alexander E.",
"Dunn, Alexander E. F. G.",
"Jet Propulsion Laboratory",
"JPL",
"Google Inc.",
"Suffixed Jr., James",
]
๐ Describe the bug
When parsing the and in PDS4 XML labels, a wobbly is thrown if the value doesn't follow the formation rules for using commas and semicolons. For instance, this series of values will fail:
(1) NASA will throw a wobbly because there is no comma (2) Google will be parsed inaccurately as,
๐ To Reproduce
See example above
๐ต๏ธ Expected behavior
(1) Allow a value (within the set of values) to not require a comma. (2) A more difficult fix will be to "interpret" the Google example.
๐ Version of Software Used
N/A
๐ฉบ Test Data / Additional context
๐Screenshots
๐ฅ System Info
๐ฆ Related requirements
โ๏ธ Engineering Details
Per @jordanpadams let's try to better handle case (1), but I don't think we will ever be able to handle case (2) until the PDS4 Information Model is improved.