eclipse-pass / main

Catch all repository against which issues of general, cross cutting topics are logged.
Apache License 2.0
4 stars 8 forks source link

Fix formatting of abstract in DSpace metadata #1049

Open markpatton opened 3 weeks ago

markpatton commented 3 weeks ago

What?

If you check JScholarship you will see that some of the deposits have abstracts with XML in them. See for example: https://jscholarship.library.jhu.edu/items/3470e8b8-38c6-45a7-9610-3a357e2fc0d6.

For DSpace the abstract should be formatted nicely.

Why?

This is a bad experience for DSpace users.

How?

The abstract value probably comes from the DOI.

Acceptance Criteria

Demonstrate that deposits to DSpace have reasonable formatting.

Related Issues

markpatton commented 4 days ago

DSpace supports markdown for abstracts if configured to do so.

The doi service returns a JSON object from CrossRef with the metadata. In that object the abstract value is XML serialized to a string like:

<jats:title>Abstract</jats:title>\n               <jats:sec>\n                  <jats:title>Objective</jats:title>\n                  <jats:p>The purpose of this paper is to determine a claims-based definition of frontloaded home health physical therapy (HHPT) and examine the effect of frontloaded HHPT visits on all-cause 30-day hospital readmissions.</jats:p>\n               </jats:sec>\n               <jats:sec>\n                  <jats:title>Methods</jats:title>\n                  <jats:p>This study used a retrospective analysis of Medicare fee-for-service claims from older adults (≥65 years) in the National Health and Aging Trends Study (NHATS; 2011–2017) with ≥1 HHPT visit within 30 days of a hospitalization (n = 1344 hospitalizations; weighted n = 7,727,384). An exploratory analysis of home health claim distribution was conducted to determine definitions of frontloaded HHPT. Generalized linear models were then used to examine the relationship between hospital readmission and each definition of frontloading.</jats:p>\n               </jats:sec>\n               <jats:sec>\n                  <jats:title>Results</jats:title>\n                  <jats:p>Four definitions of frontloaded HHPT were identified: ≥2 HHPT visits in the first week after discharge; ≥3 visits in the first week; ≥4 visits in the first 2 weeks; and ≥ 5 visits in the first 2 weeks. The adjusted risk of readmission was lower for older adults receiving frontloaded HHPT in the first week: (risk ratio [RR] for ≥2 vs &amp;lt;2 visits = 0.57; 95% CI = 0.41–0.79; RR for ≥3 vs &amp;lt;3 visits = 0.39; 95% CI = 0.22–0.72). The reduction in risk of readmission was even greater for older adults receiving ≥4 versus &amp;lt;4 HHPT visits (RR = 0.32; 95% CI = 0.21–0.48) and ≥ 5 versus &amp;lt;5 HHPT visits (RR = 0.27; 95% CI = 0.14–0.50) within the first 2 weeks. The effect of HHPT frontloading was greater for patients hospitalized with surgical versus medical diagnoses and for patients with diagnoses targeted by the Hospital Readmissions Reduction Program.</jats:p>\n               </jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion</jats:title>\n                  <jats:p>Frontloaded HHPT reduces 30-day hospital readmissions among Medicare beneficiaries. Additional research is needed to determine the optimal number of visits and those most likely to benefit from frontloaded HHPT.</jats:p>\n               </jats:sec>\n               <jats:sec>\n                  <jats:title>Impact</jats:title>\n                  <jats:p>Frontloaded HHPT can be an effective approach for reducing 30-day hospital readmissions among Medicare beneficiaries.</jats:p>\n               </jats:sec>

According to https://www.crossref.org/documentation/schema-library/markup-guide-metadata-segments/abstracts/, the abstracts may be jats XML. But are other formats possible? Is there any way to tell except by inspection?

JATS info: https://jats.nlm.nih.gov/index.html

markpatton commented 4 days ago

JATS is quite a large specification. It is intended to be handle a full journal article. See https://jats.nlm.nih.gov/publishing/tag-library/1.2/chapter/how-to-read.html.

Doing a complete transformation to Markdown would be a fair amount of work. If we assume only a subset is used for the abstract, this could be done reasonably.

markpatton commented 4 days ago

JATS to HTML converter:

markpatton commented 19 hours ago

Enabling Markdown display for DSpace abstracts also enables support for HTML tags in the abstract.