Two-phase DITA transition

My colleague and I have been working on a parser for this for months.

I look forward to providing a faceted search capability in the future, where users filter down the data of interest using drop-down lists. To do this, we need a detailed information model, and understand all of the subtle differences in the data.

But, on the other side I have author(s) who are waiting for their DITA editing capability.

So, I'm now considering a two-phase approach.

Low level of document parsing. Process a lot of content ignorant of the detailed type/meaning. Publish content as WebHelp solution (with full text search).
Parse the above body of DITA content, moving to a more detailed DITA specialisation.

So, I would be deferring the task of understanding the detailed content to the second phase. In the second phase we would move from a general Pub-5 schema to a more detailed one. We would be parsing DITA content, which should be easier to navigate than the HTML we're currently working with.

An example of this would be how we process this content:

Frequencies:  1, 3, 6-8, 12

In the phase-1 on approach, we'll just store this text in the frequency element: 1, 3, 6-8, 12

In the phase-2 approach we'll parse them into an array of either single value frequencies or frequency ranges, like this: 1, 3, 6-8, 12.

Hmm, actually, it could be an option to take an even higher level approach to this, where we have a whole page element (set of paragraphs, or a specific table) in our DITA element.

DeepBlueCLtd / Fi3ldMan

Two-phase DITA transition #16