I noticed writing an annotation file was slow for a file with many annotations.
Running line-profiling on writing functions, I found out that the field2bytes function was taking up most of the execution time.
So, it turns out that the problem was with this line:
typecode = ann_label_table.loc[ann_label_table["symbol"] == value[1], "label_store"].values[0]
What happened was that we filtered through all the ann_label_table DataFrame for every input value of field2bytes, so this was pretty slow. Instead, I added a dictionnary that maps every symbols to its corresponding label, which is much faster (see the time profiler output below)
Time profilers
Current version
Total time: 86.361 s
File: wfdb/io/annotation.py
Function: field2bytes at line 1602
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1602 @profile
1603 def field2bytes(field, value):
1604 """
1605 Convert an annotation field into bytes to write.
1606
1607 Parameters
1608 ----------
1609 field : str
1610 The annotation field of the value to be converted to bytes.
1611 value : list
1612 The value to be converted to bytes.
1613
1614 Returns
1615 -------
1616 data_bytes : list, ndarray
1617 All of the bytes to be written to the annotation file.
1618
1619 """
1620 361156 273292.0 0.8 0.3 data_bytes = []
1621
1622 # samp and sym bytes come together
1623 361156 248245.0 0.7 0.3 if field == "samptype":
1624 # Numerical value encoding annotation symbol
1625 179612 83467815.0 464.7 96.6 typecode = ann_label_table.loc[ann_label_table["symbol"] == value[1], "label_store"].values[0]
1626 #typecode = typecodes[value[1]]
1627 # sample difference
1628 179612 236106.0 1.3 0.3 sd = value[0]
1629
1630 179612 131775.0 0.7 0.2 data_bytes = []
1631
1632 # Add SKIP element(s) if the sample difference is too large to
1633 # be stored in the annotation type word.
1634 #
1635 # Each SKIP element consists of three words (6 bytes):
1636 # - Bytes 0-1 contain the SKIP indicator (59 << 10)
1637 # - Bytes 2-3 contain the high 16 bits of the sample difference
1638 # - Bytes 4-5 contain the low 16 bits of the sample difference
1639 # If the total difference exceeds 2**31 - 1, multiple skips must
1640 # be used.
1641 181444 255089.0 1.4 0.3 while sd > 1023:
1642 1832 3423.0 1.9 0.0 n = min(sd, 0x7FFFFFFF)
1643 1832 915.0 0.5 0.0 data_bytes += [
1644 1832 931.0 0.5 0.0 0,
1645 1832 916.0 0.5 0.0 59 << 2,
1646 1832 2251.0 1.2 0.0 (n >> 16) & 255,
1647 1832 1563.0 0.9 0.0 (n >> 24) & 255,
1648 1832 1583.0 0.9 0.0 (n >> 0) & 255,
1649 1832 2294.0 1.3 0.0 (n >> 8) & 255,
1650 ]
1651 1832 1957.0 1.1 0.0 sd -= n
1652
1653 # Annotation type itself is stored as a single word:
1654 # - bits 0 to 9 store the sample difference (0 to 1023)
1655 # - bits 10 to 15 store the type code
1656 179612 442489.0 2.5 0.5 data_bytes += [sd & 255, ((sd & 768) >> 8) + 4 * typecode]
1657
1658 181544 100423.0 0.6 0.1 elif field == "num":
1659 # First byte stores num
1660 # second byte stores 60*4 indicator
1661 data_bytes = [value, 240]
1662 181544 95246.0 0.5 0.1 elif field == "subtype":
1663 # First byte stores subtype
1664 # second byte stores 61*4 indicator
1665 1932 1299.0 0.7 0.0 data_bytes = [value, 244]
1666 179612 95012.0 0.5 0.1 elif field == "chan":
1667 # First byte stores num
1668 # second byte stores 62*4 indicator
1669 data_bytes = [value, 248]
1670 179612 107277.0 0.6 0.1 elif field == "aux_note":
1671 # - First byte stores length of aux_note field
1672 # - Second byte stores 63*4 indicator
1673 # - Then store the aux_note string characters
1674 179612 531112.0 3.0 0.6 data_bytes = [len(value), 252] + [ord(i) for i in value]
1675 # Zero pad odd length aux_note strings
1676 179612 150545.0 0.8 0.2 if len(value) % 2:
1677 data_bytes.append(0)
1678
1679 361156 209407.0 0.6 0.2 return data_bytes
New version
Total time: 2.40503 s
File: /home/nicolasbg/miniconda3/envs/physionet/lib/python3.7/site-packages/wfdb/io/annotation.py
Function: field2bytes at line 1602
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1602 @profile
1603 def field2bytes(field, value):
1604 """
1605 Convert an annotation field into bytes to write.
1606
1607 Parameters
1608 ----------
1609 field : str
1610 The annotation field of the value to be converted to bytes.
1611 value : list
1612 The value to be converted to bytes.
1613
1614 Returns
1615 -------
1616 data_bytes : list, ndarray
1617 All of the bytes to be written to the annotation file.
1618
1619 """
1620 361156 199665.0 0.6 8.3 data_bytes = []
1621
1622 # samp and sym bytes come together
1623 361156 213260.0 0.6 8.9 if field == "samptype":
1624 # Numerical value encoding annotation symbol
1625 179612 121482.0 0.7 5.1 typecode = typecodes[value[1]]
1626 # sample difference
1627 179612 100643.0 0.6 4.2 sd = value[0]
1628
1629 179612 97847.0 0.5 4.1 data_bytes = []
1630
1631 # Add SKIP element(s) if the sample difference is too large to
1632 # be stored in the annotation type word.
1633 #
1634 # Each SKIP element consists of three words (6 bytes):
1635 # - Bytes 0-1 contain the SKIP indicator (59 << 10)
1636 # - Bytes 2-3 contain the high 16 bits of the sample difference
1637 # - Bytes 4-5 contain the low 16 bits of the sample difference
1638 # If the total difference exceeds 2**31 - 1, multiple skips must
1639 # be used.
1640 181444 147706.0 0.8 6.1 while sd > 1023:
1641 1832 2554.0 1.4 0.1 n = min(sd, 0x7FFFFFFF)
1642 1832 986.0 0.5 0.0 data_bytes += [
1643 1832 991.0 0.5 0.0 0,
1644 1832 968.0 0.5 0.0 59 << 2,
1645 1832 1856.0 1.0 0.1 (n >> 16) & 255,
1646 1832 1570.0 0.9 0.1 (n >> 24) & 255,
1647 1832 1548.0 0.8 0.1 (n >> 0) & 255,
1648 1832 2074.0 1.1 0.1 (n >> 8) & 255,
1649 ]
1650 1832 1556.0 0.8 0.1 sd -= n
1651
1652 # Annotation type itself is stored as a single word:
1653 # - bits 0 to 9 store the sample difference (0 to 1023)
1654 # - bits 10 to 15 store the type code
1655 179612 253318.0 1.4 10.5 data_bytes += [sd & 255, ((sd & 768) >> 8) + 4 * typecode]
1656
1657 181544 100786.0 0.6 4.2 elif field == "num":
1658 # First byte stores num
1659 # second byte stores 60*4 indicator
1660 data_bytes = [value, 240]
1661 181544 99500.0 0.5 4.1 elif field == "subtype":
1662 # First byte stores subtype
1663 # second byte stores 61*4 indicator
1664 1932 1163.0 0.6 0.0 data_bytes = [value, 244]
1665 179612 98431.0 0.5 4.1 elif field == "chan":
1666 # First byte stores num
1667 # second byte stores 62*4 indicator
1668 data_bytes = [value, 248]
1669 179612 102374.0 0.6 4.3 elif field == "aux_note":
1670 # - First byte stores length of aux_note field
1671 # - Second byte stores 63*4 indicator
1672 # - Then store the aux_note string characters
1673 179612 541168.0 3.0 22.5 data_bytes = [len(value), 252] + [ord(i) for i in value]
1674 # Zero pad odd length aux_note strings
1675 179612 120971.0 0.7 5.0 if len(value) % 2:
1676 data_bytes.append(0)
1677
1678 361156 192616.0 0.5 8.0 return data_bytes
Hi,
I noticed writing an annotation file was slow for a file with many annotations. Running line-profiling on writing functions, I found out that the
field2bytes
function was taking up most of the execution time.So, it turns out that the problem was with this line:
typecode = ann_label_table.loc[ann_label_table["symbol"] == value[1], "label_store"].values[0]
What happened was that we filtered through all the
ann_label_table
DataFrame for every input value offield2bytes
, so this was pretty slow. Instead, I added a dictionnary that maps every symbols to its corresponding label, which is much faster (see the time profiler output below)Time profilers
Current version
New version