This repository is a sub-repository of PULSE.
This repository provides the official implementation of NormPULSE.
Key feature bulletin points here
We outline the comprehensive framework of our solution to clinical term normalization, NormPULSE, which is based on PULSE and comprises three steps:
The part of clinical term normalization data is based on the following two open-source datasets.
The standard terminology database is ICD-10医保2.0版 and ICD-9-CM3医保2.0版, and we construct the two corresponding code trees by parsing the term codes, which are available at ICD-10_医保v2_tree.json and ICD-9-CM3_医保v2_tree.json
We also provide the examples of the training data at the data directory.
Main Requirements
cuda, no more than 12.x. Preferably 11.4
python=3.9.16
transformers>=4.29.2
faiss-gpu==1.7.2
torch==2.0.1 sentence-transformers==2.2.2
fastapi
uvicorn
NodeJS>=18.x
GPU memory 16 GB at least
Make sure your frontend port 3000 and backend port 2233 is available, or you can change them in main.ts and run.py
Installation
git clone https://github.com/JOHNNY-fans/NormPULSE.git
cd NormPULSE
conda create -n normllm python=3.9.16
conda activate normllm
pip install -r requirements.txt
Download Model
You can find the NormPULSE weights in the following huggingface repository.
In the retrieval step, we select the open-source M3E model as the text embedding model.
Usage
We provide a sample usage in a jupyter notebook usage_example.ipynb
Here is our simple demo.
Run Frontend
cd demo-frontend
npm i
npm run dev
Run Backend
cd demo-backend
python run.py
The code of this project is licensed under Apache 2.0, and the model weights are licensed under GNU AGPL 3.0. If the models contained in this project, or any modified versions thereof, are used in a service that results in misleading or harmful statements causing adverse effects, the responsibility lies with the service provider and is not associated with or attributable to this project.