EHRI / ehri-rest

Web service and business logic for managing EHRI collection metadata.
European Union Public License 1.2
3 stars 7 forks source link

Plain-text numbered 'headers' in EAD are converted to single-item numbered lists #18

Open bencomp opened 10 years ago

bencomp commented 10 years ago

Found in ITS's description of their fonds R 2.

The EAD has:

<p>1. First header</p>

Paragraph text and <lb/>, then

<p>2. Second 'header'</p>

more text, and headers.

It comes out as shown on the acceptance server: as an ordered list of 1 item each, thus showing 1. title, text, 1. title, text, etc.

I'm not sure whether this is a front-end bug or backend bug.

mikesname commented 10 years ago

Looks like a front-end bug with the markdown renderer. Those titles are interpreted as the start of markdown ordered lists. If you put this into the pandoc demo it gives <ol> nodes with correct start attributes. However the pegdown rendering I'm using on the front-end doesn't.

Suddenly the idea to use markdown on the text fields looks a bit iffy.

mikesname commented 10 years ago

Might be worth having a general discussion about how to usefully display all the non-standard guff people are going to put in their text sections. Perhaps another candidate for a general pre-processing pipeline task?

mikesname commented 10 years ago

Note: we could just slap their markup (anything within <scopeContent>) straight into the database. However there are very often titles within those that (redundantly) say things like <header>Scope and content</header>, and those would not display nicely.