Open tompollard opened 4 years ago
Hey @tompollard, I have begun to write out the annotation model here:
class AnnotationLabel(models.Model):
"""
A way to save and edit annotation labels for signals.
"""
project = models.OneToOneField('project.PublishedProject', related_name='ann',
on_delete=models.CASCADE)
edited_by = models.ForeignKey('user.User', related_name='ann_editor',
on_delete=models.CASCADE)
creation_datetime = models.DateTimeField(auto_now_add=True)
platform_name = models.CharField(max_length=150, null=True)
record_name = models.CharField(max_length=150, null=True)
My thoughts are that:
AnnotationLabel
class model. AnnotationLabel
creation request is coming from. I think that this may be the most generalized that we can get when it comes to sharing annotation models. For example, finding similarities between the actual annotation structure of signals and images may be difficult.
As for potential annotation structures, one that is particularly appealing for signals may be the format used by Label Studio. This structure can be used for both region [PR interval, QRS complex, etc.] (setting a start and stop time) and beat [Normal, AFIB, etc.] (setting the stop time to null / same time as the start time) annotations. Of course, we can edit and modify this how we like but this may be a good start.
See an example of input annotations and output labeled annotation JSON here:
[
{
"id": "gyV6XOeyCz",
"from_name": "label",
"to_name": "audio",
"source": "$url",
"type": "labels",
"original_length": 3.774376392364502,
"value": {
"start": -0.004971698554622573,
"end": 0.20349676497713773,
"labels": [
"Politics"
]
}
},
{
"id": "PJqb8mmmsC",
"from_name": "label",
"to_name": "audio",
"source": "$url",
"type": "labels",
"original_length": 3.774376392364502,
"value": {
"start": 0.39002117971608113,
"end": 0.6698078018244963,
"labels": [
"Business"
]
}
},
{
"id": "xcHF2NJUcs",
"from_name": "label",
"to_name": "audio",
"source": "$url",
"type": "labels",
"original_length": 3.774376392364502,
"value": {
"start": 0.867304240959848,
"end": 3.127541266619986,
"labels": [
"Education"
]
}
}
]
Currently the attributes of the WFDB Annotation class used for writing the WFDB-format annotation files are:
[ 'ann_len', 'aux_note', 'chan', 'contained_labels', 'custom_labels', 'description', 'extension', 'fs',
'label_store', 'num', 'record_name', 'sample', 'subtype', 'symbol']
ann_len : int
The number of samples in the annotation.
aux_note : list, optional
A list containing the auxiliary information string (or None for
annotations without notes) for each annotation.
chan : ndarray, optional
A numpy array containing the signal channel associated with each
annotation.
contained_labels : pandas dataframe, optional
The unique labels contained in this annotation. Same structure as
`custom_labels`.
custom_labels : pandas dataframe, optional
The custom annotation labels defined in the annotation file. Maps
the relationship between the three label fields. The data type is a
pandas DataFrame with three columns:
['label_store', 'symbol', 'description'].
description : list, optional
A list containing the descriptive string of each annotation label.
extension : str
The file extension of the file the annotation is stored in.
fs : int, float, optional
The sampling frequency of the record.
label_store : ndarray, optional
The integer value used to store/encode each annotation label.
num : ndarray, optional
A numpy array containing the labelled annotation number for each
annotation.
record_name : str
The base file name (without extension) of the record that the
annotation is associated with.
sample : ndarray
A numpy array containing the annotation locations in samples relative to
the beginning of the record.
subtype : ndarray, optional
A numpy array containing the marked class/category of each annotation.
symbol : list, numpy array, optional
The symbols used to display the annotation labels. List or numpy array.
If this field is present, `label_store` must not be present.
These are some of the things that we should consider when building this new annotation model, especially if we decide to incorporate some of the functionality of Label Studio. I think some of these may be able to be cut out, but should we keep them for compatibility if we decide to write a conversion method in the future?
*Some background on the conversion issue, @tompollard suggested, and I agreed, that it would be best to store these labels in XML (possibly JSON) format since it's easier to access and is much more flexible. If someone wanted these annotations in WFDB format, then we could have a conversion method for that.
Label Studio is releasing a time-series dedicated annotation platform which allows the user to make annotations for both ranges of times and singular times. Here is what the demo looks like:
You'll note that the user can specify the event they wish to annotate and then perform the desired annotation using a double-click for a singular time point annotations and click-and-drag for time range annotations. You can also see the previous completions done which we can use to track multiple user who wish to annotate a single project. Additionally, we have the ability to set a ground truth set of annotations if we ever desire that functionality. Here is the resulting JSON (note single time annotations are saved with the same start and end time):
Result
[
{
"id": "QKaimQjoTQ",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592250821941.2595,
"end": 1592250831927.112,
"instant": false,
"timeserieslabels": [
"Event 1"
]
}
},
{
"id": "RSj46Dzkhe",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592250921955.7407,
"end": 1592250921955.7407,
"instant": true,
"timeserieslabels": [
"Event 1"
]
}
},
{
"id": "RKODZiMgsp",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592251211907.621,
"end": 1592251211907.621,
"instant": true,
"timeserieslabels": [
"Event 1"
]
}
},
{
"id": "nkRg1P9L5L",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592251461993.5276,
"end": 1592251711941.2742,
"instant": false,
"timeserieslabels": [
"Event 2"
]
}
},
{
"id": "NE7unB1-J1",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592252101985.5444,
"end": 1592252101985.5444,
"instant": true,
"timeserieslabels": [
"Event 3"
]
}
},
{
"id": "oHQC4dE7-u",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592252011979.126,
"end": 1592252441979.4265,
"instant": false,
"timeserieslabels": [
"Event 1"
]
}
},
{
"id": "M-dMRAbRxu",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592251341969.1328,
"end": 1592251341969.1328,
"instant": true,
"timeserieslabels": [
"Event 1"
]
}
},
{
"id": "agpadQD5i_",
"from_name": "label",
"to_name": "ts",
"source": "$csv",
"type": "timeserieslabels",
"parent_id": null,
"value": {
"start": 1592252721959.5007,
"end": 1592252851914.7446,
"instant": false,
"timeserieslabels": [
"Event 3"
]
}
}
]
I think it's worth it to note that WFDB has a function called rr2ann which converts a series of RR Intervals to annotations. I have already developed the reverse, ann2rr
in the latest 3.1.0 release of WFDB-Python and will plan to add this functionality in the next release. We can use the beat annotations generated with the Label Studio annotation platform to generate RR intervals and convert them to annotations in WFDB format using WFDB-Python.
There are several existing platforms that could be used to gather useful annotations for PhysioNet datasets. This needs a lot more thought, but as a rough idea it would be good to develop a general API that:
Metadata profile for an annotation
The structure of the annotation will need to be developed. At minimum, the metadata should probably include:
Metadata profile for an annotation task
One of the major challenges is understanding how the API can be made generalizable across PhysioNet, ideally to support multiple data types and modalities (images, waveforms, notes, etc). It feels like the annotation task will require a formal definition that would state things like:
Providing an interface for the annotation functionality
Annotation tasks may be driven by the research question, and there may be multiple annotation tasks for a single dataset. We need to come up with a simple way of allowing PhysioNet users to propose and implement an annotation task. My suggestion is that we do this with the use of a new "annotation" project type (see https://github.com/MIT-LCP/physionet-build/issues/1032).
Summary of tasks
So in summary, some good first steps might be to: