clarin-eric / ParlaMint

ParlaMint: Comparable Parliamentary Corpora
https://clarin-eric.github.io/ParlaMint/
43 stars 53 forks source link

NL feedback #543

Closed matyaskopp closed 1 year ago

matyaskopp commented 1 year ago

application description

I think that it should be obvious what type of annotation has been done with a given application. eg: Was the lemmatization done by int-tagger or udify?

https://github.com/RubenvanHeusden/ParlaMint/blob/615ef90d0aaf0cdb499737a6792d6923c0a238d5/Data/ParlaMint-NL/ParlaMint-NL.ana.xml#L1005-L1025

         <appInfo>
            <application ident="int-tagger" version="1.0">
               <label>INT Tagger, lemmatizer and Tokenizer</label>
               <desc xml:lang="en">INT Tagger, lemmatizer and Tokenizer for modern Dutch, based on old-school machine learning (SVM). It provides the legacy PoS tags (encoded in w/@ana) and the lemmata for Dutch. Not publicly available.</desc>
            </application>
            <application ident="udify" version="1.0">
               <label>Udify</label>
               <desc xml:lang="en">UDify is a single model that parses Universal Dependencies (UPOS, UFeats, Lemmas, Deps) jointly, accepting any of 75 supported languages as input. Available from <ref target="https://github.com/Hyperparticle/udify">https://github.com/Hyperparticle/udify</ref>
               </desc>
            </application>
            <application ident="flair-ner" version="1.0">
               <label>Flair NLP NER tagging for French and Dutch</label>
               <desc xml:lang="en">A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation and classification, with support for a rapidly growing number of languages. Available from <ref target="https://github.com/flairNLP/flair">https://github.com/flairNLP/flair</ref>
               </desc>
            </application>
            <application ident="trankit" version="1.0">
               <label>Trankit: A Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing</label>
               <desc xml:lang="en">Trankit is a light-weight Transformer-based Python Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 downloadable pretrained pipelines for 56 languages. Available from <ref target="https://github.com/nlp-uoregon/trankit">https://github.com/nlp-uoregon/trankit</ref>
               </desc>
            </application>
         </appInfo>