audeering / audformat

Format to store media files and annotations
https://audeering.github.io/audformat/
Other
10 stars 0 forks source link

Enhance output of audformat.Database.description #59

Open hagenw opened 3 years ago

hagenw commented 3 years ago

At the moment we get the following:

>>> import audb
>>> db = audb.load('emodb', version='1.1.0')
>>> db.description
'Berlin Database of Emotional Speech. A German database of emotional utterances spoken by actors recorded as a part of the DFG funded research project SE462/3-1 in 1997 and 1999. Recordings took place in the anechoic chamber of the Technical University Berlin, department of Technical Acoustics. It contains about 500 utterances from ten different actors expressing basic six emotions and neutral.'

which is not very nice to read.

It get's even worse if you have some real formatting in the description string. For example, for audioset the description contains:

AudioSet ontology categories of the two top hierarchies:

Human sounds            Animal                   Music
|-Human voice           |-Domestic animals, pets |-Musical instrument
|-Whistling             |-Livestock, farm        |-Music genre
|-Respiratory sounds    | animals, working       |-Musical concepts
|-Human locomotion      | animals                |-Music role
|-Digestive             \-Wild animals           \-Music mood
|-Hands
|-Heart sounds,         Sounds of things         Natural sounds
| heartbeat             |-Vehicle                |-Wind
|-Otoacoustic emission  |-Engine                 |-Thunderstorm
\-Human group actions   |-Domestic sounds,       |-Water
                        | home sounds            \-Fire
Source-ambiguous sounds |-Bell
|-Generic impact sounds |-Alarm                  Channel, environment
|-Surface contact       |-Mechanisms             and background
|-Deformable shell      |-Tools                  |-Acoustic environment
|-Onomatopoeia          |-Explosion              |-Noise
|-Silence               |-Wood                   \-Sound reproduction
\-Other sourceless      |-Glass
                        |-Liquid
                        |-Miscellaneous sources
                        \-Specific impact sounds

which would be nice if we could preserve it when printing to screen.

hagenw commented 2 years ago

It's not only description:

>>> db = audb.load('iemocap', versdion='2.2.0', only_metadata=True)
>>> db.schemes["dialog.act"]
description: Dialogue act annotations.Released by https://github.com/sahatulika15/EMOTyDA.Please
  cite the respective paper whenever using it.
dtype: str
labels: {g: greeting, q: question, ans: answer, o: statement-opinion, s: statement-non-opinion,
  ap: apology, ag: agreement, dag: disagreement, a: acknowledgement, b: backchanneling,
  c: command, oth: other}
bibtex: "\n                @inproceedings{saha-etal-2020-towards,\n              \
  \      title = \"Towards Emotion-aided Multi-modal Dialogue Act Classification\"\
  ,\n                    author = \"Saha, Tulika  and\n                    Patra,\
  \ Aditya  and\n                    Saha, Sriparna  and\n                    Bhattacharyya,\
  \ Pushpak\",\n                    booktitle = \"Proceedings of the 58th Annual Meeting\
  \ of the Association for Computational Linguistics\",\n                    month\
  \ = jul,\n                    year = \"2020\",\n                    address = \"\
  Online\",\n                    publisher = \"Association for Computational Linguistics\"\
  ,\n                    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\
  ,\n                    doi = \"10.18653/v1/2020.acl-main.402\",\n              \
  \      pages = \"4361--4372\",\n                }\n            "
frankenjoe commented 2 years ago

yaml offers the option to modify the way scalars are presented, e.g.:

import audformat

bibtex = '@inproceedings{saha-etal-2020-towards,\n\
    title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"\n\
    author = "Saha, Tulika and\n\
        Patra, Aditya and\n\
        Saha, Sriparna and\n\
        Bhattacharyya, Pushpak"\n\
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",\n\
    month = jul,\n\
    year = "2020",\n\
    address = "Online",\n\
    publisher = "Association for Computational Linguistics"\n\
    url = "https://www.aclweb.org/anthology/2020.acl-main.402"\n\
    doi = "10.18653/v1/2020.acl-main.402",\n\
    pages = "4361--4372"'

db = audformat.testing.create_db(minimal=True)
db.schemes['scheme'] = audformat.Scheme(meta={'bibtex': bibtex})
print(db.schemes['scheme'])
{dtype: str, bibtex: "@inproceedings{saha-etal-2020-towards,\n    title = \"Towards\
    \ Emotion-aided Multi-modal Dialogue Act Classification\"\n    author = \"Saha,\
    \ Tulika and\n        Patra, Aditya and\n        Saha, Sriparna and\n        Bhattacharyya,\
    \ Pushpak\"\n    booktitle = \"Proceedings of the 58th Annual Meeting of the Association\
    \ for Computational Linguistics\",\n    month = jul,\n    year = \"2020\",\n \
    \   address = \"Online\",\n    publisher = \"Association for Computational Linguistics\"\
    \n    url = \"https://www.aclweb.org/anthology/2020.acl-main.402\"\n    doi =\
    \ \"10.18653/v1/2020.acl-main.402\",\n    pages = \"4361--4372\""}
dtype: str

can be prettified to:

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
print(db.schemes['scheme'])
bibtex: |-
  @inproceedings{saha-etal-2020-towards,
      title = "Towards Emotion-aided Multi-modal Dialogue Act Classification"
      author = "Saha, Tulika and
          Patra, Aditya and
          Saha, Sriparna and
          Bhattacharyya, Pushpak"
      booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
      month = jul,
      year = "2020",
      address = "Online",
      publisher = "Association for Computational Linguistics"
      url = "https://www.aclweb.org/anthology/2020.acl-main.402"
      doi = "10.18653/v1/2020.acl-main.402",
      pages = "4361--4372"

For some reason it did not work with the above example, though. But the formatting of the string looks also a bit odd.

hagenw commented 1 year ago

This might indeed be a solution. For audioset it will not help as the description string does not contain \n at the moment, but we could update it.