Closed apmoore1 closed 2 years ago
Merging #32 (5feb6ef) into main (d381180) will increase coverage by
1.42%
. The diff coverage is99.00%
.
@@ Coverage Diff @@
## main #32 +/- ##
==========================================
+ Coverage 97.62% 99.05% +1.42%
==========================================
Files 8 21 +13
Lines 337 1057 +720
Branches 66 214 +148
==========================================
+ Hits 329 1047 +718
+ Misses 7 0 -7
- Partials 1 10 +9
Impacted Files | Coverage Δ | |
---|---|---|
pymusas/lexicon_collection.py | 98.02% <97.49%> (-1.98%) |
:arrow_down: |
pymusas/taggers/rules/mwe.py | 98.73% <98.73%> (ø) |
|
pymusas/rankers/lexicon_entry.py | 99.18% <99.18%> (ø) |
|
pymusas/__init__.py | 100.00% <100.00%> (ø) |
|
pymusas/base.py | 100.00% <100.00%> (ø) |
|
pymusas/pos_mapper.py | 100.00% <100.00%> (ø) |
|
pymusas/rankers/lexical_match.py | 100.00% <100.00%> (ø) |
|
pymusas/rankers/ranking_meta_data.py | 100.00% <100.00%> (ø) |
|
pymusas/spacy_api/lexicon_collection.py | 100.00% <100.00%> (+100.00%) |
:arrow_up: |
pymusas/spacy_api/pos_mapper.py | 100.00% <100.00%> (ø) |
|
... and 10 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update dcd90e0...5feb6ef. Read the comment docs.
Added
Notes -> Multi Word Expression Syntax
in theUsage
section of the documentation. This is the first task of issue #24.py.typed
file.pymusas.lexicon_collection.LexiconCollection.to_bytes
function usessrsly
to serialise theLexiconCollection
tobytes
.pymusas.base.Serialise
, that requires sub-classes to create two methodsto_bytes
andfrom_bytes
so that the class can be serialised.pymusas.lexicon_collection.LexiconCollection
has three new methodsto_bytes
,from_bytes
, and__eq__
. This allows the collection to be serialised and to be compared to other collections.pymusas.lexicon_collection.MWELexiconCollection
, which allows a user to easily create and / or load in from a TSV file a MWE lexicon, like the MWE lexicons from the Multilingual USAS repository. In addition it contains the functionality to match a MWE template to templates stored in theMWELexiconCollection
class following the MWE special syntax rules, this is all done through themwe_match
method. It also supports Part Of Speech mapping so that you can map from the lexicon's POS tagset to the tagset of your choice, in both a one-to-one and one-to-many mapping. Like thepymusas.lexicon_collection.LexiconCollection
it containsto_bytes
,from_bytes
, and__eq__
methods for serialisation and comparisons.List
ofRule
s and aRanker
whereby eachRule
defines how a token(s) in a text can be matched to a semantic category. Given the matches from theRule
s the for each token, a token can have zero or more matches, theRanker
ranks each match and finds the global best match for each token in the text. The taggers now support direct match and wildcard Multi Word Expressions. Due to this:pymusas.taggers.rule_based.USASRuleBasedTagger
has been changed and re-named topymusas.taggers.rule_based.RuleBasedTagger
and now only has a__call__
method.pymusas.spacy_api.taggers.rule_based.USASRuleBasedTagger
has been changed and re-named topymusas.spacy_api.taggers.rule_based.RuleBasedTagger
.pymusas.taggers.rules
:pymusas.taggers.rules.rule.Rule
an abstract class that describes how other sub-classes define the__call__
method and it's signature. This abstract class is sub-classed frompymusas.base.Serialise
.pymusas.taggers.rules.single_word.SingleWordRule
a concrete sub-class ofRule
for finding Single word lexicon entry matches.pymusas.taggers.rules.mwe.MWERule
a concrete sub-class ofRule
for finding Multi Word Expression entry matches.pymusas.rankers
:pymusas.rankers.ranking_meta_data.RankingMetaData
describes a lexicon entry match, that are typically generated frompymusas.taggers.rules.rule.Rule
classes being called. These matches indicate that some part of a text, one or more tokens, matches a lexicon entry whether that is a Multi Word Expression or single word lexicon.pymusas.rankers.lexicon_entry.LexiconEntryRanker
an abstract class that describes how other sub-classes should rank each token in the text and the expected output through the class's__call__
method. This abstract class is sub-classed frompymusas.base.Serialise
.pymusas.rankers.lexicon_entry.ContextualRuleBasedRanker
a concrete sub-class ofLexiconEntryRanker
based off the ranking rules from Piao et al. 2003.pymusas.rankers.lexical_match.LexicalMatch
describes the lexical match within apymusas.rankers.ranking_meta_data.RankingMetaData
object.pymusas.utils.unique_pos_tags_in_lexicon_entry
a function that given a lexicon entry, either Multi Word Expression or Single word, returns aSet[str]
of unique POS tags in the lexicon entry.pymusas.utils.token_pos_tags_in_lexicon_entry
a function that given a lexicon entry, either Multi Word Expression or Single word, yields aTuple[str, str]
of word and POS tag from the lexicon entry.pymusas.lexicon_collection.LexiconMetaData
, object that contains all of the meta data about a single or Multi Word Expression lexicon entry.pymusas.lexicon_collection.LexiconType
which describes the different types of single and Multi Word Expression (MWE) lexicon entires and templates that PyMUSAS uses or will use in the case of curly braces.LexiconCollection
orMWELexiconCollection
from a TSV. These can be found inpymusas.spacy_api.lexicon_collection
.SingleWordRule
andMWERule
. These can be found inpymusas.spacy_api.taggers.rules
.ContextualRuleBasedRanker
. This can be found inpymusas.spacy_api.rankers
.List
ofRule
s, this can be found here:pymusas.spacy_api.taggers.rules.rule_list
.LexiconCollection
andMWELexiconCollection
open the TSV file downloaded throughfrom_tsv
method by default usingutf-8
encoding.pymusas_rule_based_tagger
is now a spacy registered factory by using an entry point.MWELexiconCollection
warns users that it does not support curly braces MWE template expressions.pymusas.spacy_api.pos_mapper
module.Introduction
andHow-to Tag Text
usage documentation with the new updates that PyMUSAS now supports, e.g. MWE's. Also theHow-to Tag Text
is updated so that it uses the pre-configured spaCy components that have been created for each language, this spaCy components can be found and downloaded from the pymusas-models repository.Removed
pymusas.taggers.rule_based.USASRuleBasedTagger
this is now replaced withpymusas.taggers.rule_based.RuleBasedTagger
.pymusas.spacy_api.taggers.rule_based.USASRuleBasedTagger
this is now replaced withpymusas.spacy_api.taggers.rule_based.RuleBasedTagger
.Using PyMUSAS
usage documentation page as it requires updating.