Improve modelization and text-search of DocumentReference text content

Problem

We currently use a custom extension documentreference-raw-text to index the text content of a document but we want to use DocumentReference.attachment.data instead. Using a custom extension is an interoperability issue.

Description

As described in arkhn/Cohort360#97 we want to change the FHIR attribute where the text content of a document is stored. Using the custom extension, the text is currently stored in DocumentReference.extension[0].valueString of type string which allows it to be indexed in ES through myContentText of the ResourceTable dataclass (see parseContentTextIntoWords in hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/dao/BaseHapiFhirDao.java). We want to use the DocumentReference.attachment.data (which is appropriate according to the FHIR model) but of type base64binary which is not indexed in ES and therefore prevents us from running full-text searches on this field.

Alternatives

Don't know yet, I hope we can avoid forking but I don't see it yet.

arkhn / jpaltime