Implement MedicalCodingPipeline and SummarizationPipeline
Related Issue
55
Changes Made
I come, once again, bearing breaking changes.
💥 Changes to Document container class: ordered by sub-containers nlp, concepts, hl7, cds, models for better organisation. Each attribute is in charge of handling specific data handling, usually via getter and setter functions.
Changed .add_huggingface_output() etc to .add_output(integration_name, task, output) - easier to access and manage
Added models.get_generated_text() method,
Changes to CcdData: uses a ConceptLists dataclass to contain problems, medications, allergies concepts for better interface with the Document class.
Changes to .load() method for BasePipeline: this method now configures the pipeline with additional logic that parses a model and model source (either string - name of model or path to model or a callable - langchain chain object) into a ModelConfig object.
Added ModelRouter, a helper which returns the appropriate integration component given a ModelConfig
Templates: Users can pass in a Jinja template for custom CDS cards (this will extend to CDAs too, but that's a matter for a different issue).
Added CdsCardCreator: this component either extracts generated text from model outputs in the pipeline or takes in specified static content and parses this into a CDS Card object using Jinja templates (a default is used if not provided).
Renamed integration components to be more descriptive: SpacyComponent -> SpacyNLP, HuggingFaceComponent -> HFTransformer, LangchainComponent -> LangChainLLM
Also pass kwargs to integration components
Added ._add_concepts_to_hc_doc() helper method to SpacyNLP, which takes the entities from the the spacy doc and parses it to Concept and adds it to the .concepts attribute in Document. This is hard coded to always add new concepts as SNOMED Problems for now, but will be made configurable in future.
Removed default spacy tokenizer in TextPreprocessor: this is redundant as can just use SpacyNLP. For better separation of concern this component is just for very simple text preprocessing - the default is .split() but users can also pass in a tokenizer object (Callable) to use with the component.
And finally, added MedicalCodingPipeline and SummarizationPipeline implementation.
the pipeline does some internal coercion to make the task either ner or summarization, but no strict validation yet
Description
Implement
MedicalCodingPipeline
andSummarizationPipeline
Related Issue
55
Changes Made
I come, once again, bearing breaking changes.
💥 Changes to
Document
container class: ordered by sub-containersnlp
,concepts
,hl7
,cds
,models
for better organisation. Each attribute is in charge of handling specific data handling, usually via getter and setter functions..add_huggingface_output()
etc to.add_output(integration_name, task, output)
- easier to access and managemodels.get_generated_text()
method,Changes to
CcdData
: uses aConceptLists
dataclass to contain problems, medications, allergies concepts for better interface with theDocument
class.Changes to
.load()
method forBasePipeline
: this method now configures the pipeline with additional logic that parses a model and model source (either string - name of model or path to model or a callable - langchain chain object) into aModelConfig
object.Added
ModelRouter
, a helper which returns the appropriate integration component given aModelConfig
Templates: Users can pass in a Jinja template for custom CDS cards (this will extend to CDAs too, but that's a matter for a different issue).
Added
CdsCardCreator
: this component either extracts generated text from model outputs in the pipeline or takes in specified static content and parses this into a CDSCard
object using Jinja templates (a default is used if not provided).Renamed integration components to be more descriptive:
SpacyComponent
->SpacyNLP
,HuggingFaceComponent
->HFTransformer
,LangchainComponent
->LangChainLLM
kwargs
to integration componentsAdded
._add_concepts_to_hc_doc()
helper method toSpacyNLP
, which takes the entities from the the spacy doc and parses it toConcept
and adds it to the.concepts
attribute inDocument
. This is hard coded to always add new concepts as SNOMED Problems for now, but will be made configurable in future.Removed default spacy tokenizer in
TextPreprocessor
: this is redundant as can just useSpacyNLP
. For better separation of concern this component is just for very simple text preprocessing - the default is.split()
but users can also pass in a tokenizer object (Callable) to use with the component.And finally, added
MedicalCodingPipeline
andSummarizationPipeline
implementation.ner
orsummarization
, but no strict validation yetTesting
Added tests for:
CdsCardCreator
:test_card_creator.py
ModelRouter
:test_modelrouter.py
.load()
method:test_pipeline_load.py
test_medicalcoding.py
,test_summarization.py
test_integrations.py
TextPreprocessor
initializes tokenizer object -test_preprocessor.py
Document
methods -test_containers.py
Documentation