Code-to-Text Integration

Implement a custom TextCodeEncoder that processes clinical codes with their associated text descriptions:

We need an approach that efficiently handles text descriptions for codes while maintaining temporal alignment and avoiding redundant processing of common codes. So we need an _inputencoder which does the following:

Data Preparation (on the class initialization):
- Load code descriptions from metadata parquet file
- Load a text tokenization function from huggingface autotokenizers
- Create a lookup dictionary: code/vocab_index -> tokenized_text
Batch Processing:
- Extract unique codes from the batch to avoid redundant processing
- Pass tokenized text through ClinicalBERT encoder
- Cache encodings for frequently used codes
Temporal Integration:
- Map encoded text representations back to original code positions in sequence
- Combine with existing triplet embeddings (code + value + time delta)

Oufattole / meds-torch

Code-to-Text Integration #119

Implement a custom TextCodeEncoder that processes clinical codes with their associated text descriptions: