Open anzhao opened 1 month ago
The patient ID and other information is under the label of associated images under the Image tab in QuPath. We can use Optical Character Recognition (OCR) library such as Keras-OCR to extract the info we needed.
The complete workflow of auto-extract the Patient ID and other info from the label of associated images under the Image tab within a svs file:
1. Extract the Slide Label Image:
import openslide
# Open the .svs file
slide = openslide.OpenSlide('an.svs')
# Extract the slide label image
label_image = slide.associated_images['label']
# Save the label image for the next step Optical Character Recognition (OCR) processing
label_image.save('patient_id.png')
2. Perform OCR on the Extracted Image
#!/usr/bin/env python3
import keras_ocr
# Create the pipeline
pipeline = keras_ocr.pipeline.Pipeline()
# Read the image
images = [keras_ocr.tools.read('patient_id.png')]
# Perform OCR and Recognize text in images
prediction_groups = pipeline.recognize(images)
# Print the recognized text
for predictions in prediction_groups:
for text, box in predictions:
print(text)
Metadata