Describe the bug
When using Lima with the deeplima backend, the PoS tagging is correct if you analyze one file but the tags from this file are reused for those following.
To Reproduce
Steps to reproduce the behavior:
Analyse a text file with the deeplima pipeline
Check that pos tags are overly correct
Restart the analysis but with two files
See that the second file tags are wrong, they are those from the first one.
Expected behavior
All files PoS tags should be correct
Screenshots
❯ analyzeText -l ud --meta udlang:eng-UD_English-EWT -p deeplima test-eng12.txt
2024-05-15 16:05:24.148156: I /build/tensorflow-for-lima-JkNXYb/tensorflow-for-lima-1.9/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
Analyzing 1/1 (100.00%) 'test-eng12.txt'# sent_id = 1
# text = The Airbus A380 is the largest airplane in the world
1 The _ DET _ Definite=Def|PronType=Art 4 det _ Pos=1|Len=3|SpaceAfter=No
2 Airbus _ PROPN _ Number=Sing 4 compound _ Pos=4|Len=6
3 A _ NOUN _ Number=Sing 4 compound _ Pos=11|Len=1
4 380 _ PROPN _ Number=Sing 8 nsubj _ Pos=13|Len=3|SpaceAfter=No
5 is _ AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 8 cop _ Pos=16|Len=2
6 the _ DET _ Definite=Def|PronType=Art 8 det _ Pos=19|Len=3
7 largest _ ADJ _ Degree=Sup 8 amod _ Pos=23|Len=7
8 airplane _ NOUN _ Number=Sing 0 root _ Pos=31|Len=8
9 in _ ADP _ _ 11 case _ Pos=40|Len=2
10 the _ DET _ Definite=Def|PronType=Art 11 det _ Pos=43|Len=3
11 world _ NOUN _ Number=Sing 8 nmod _ Pos=47|Len=5
12 . _ PUNCT _ _ 8 punct _ Pos=53|Len=1|SpaceAfter=No
# sent_id = 2
# text = It is used by Air France and Japan Airlines
1 It _ PRON _ Case=Nom|Gender=Neut|Number=Sing|Person=3|PronType=Prs 3 nsubj _ Pos=54|Len=2
2 is _ AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 3 aux _ Pos=57|Len=2
3 used _ VERB _ Tense=Past|VerbForm=Part|Voice=Pass 0 root _ Pos=60|Len=4
4 by _ ADP _ _ 6 case _ Pos=65|Len=2
5 Air _ PROPN _ Number=Sing 6 compound _ Pos=68|Len=3
6 France _ PROPN _ Number=Sing 3 obl _ Pos=72|Len=6
7 and _ CCONJ _ _ 9 cc _ Pos=79|Len=3
8 Japan _ PROPN _ Number=Sing 9 compound _ Pos=83|Len=5
9 Airlines _ PROPN _ Number=Plur 6 conj _ Pos=89|Len=8
10 . _ PUNCT _ _ 3 punct _ Pos=98|Len=1
❯
❯
❯
❯ analyzeText -l ud --meta udlang:eng-UD_English-EWT -p deeplima test-eng11.txt test-eng12.txt
2024-05-15 16:09:00.904687: I /build/tensorflow-for-lima-JkNXYb/tensorflow-for-lima-1.9/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
Analyzing 1/2 (50.00%) 'test-eng11.txt'# sent_id = 1
# text = * Sylva and Shining Cliff Woods are at Inverleith House, Edinburgh, open from tomorrow daily 11am-3.30pm until January 29
1 * _ PUNCT _ _ 14 punct _ Pos=1|Len=1|SpaceAfter=No
2 Sylva _ PROPN _ Number=Sing 14 nsubj _ Pos=2|Len=5
3 and _ CCONJ _ _ 6 cc _ Pos=8|Len=3
4 Shining _ PROPN _ VerbForm=Ger 5 amod _ Pos=12|Len=7
5 Cliff _ PROPN _ Number=Sing 6 compound _ Pos=20|Len=5
6 Woods _ PROPN _ Number=Plur 10 nsubj _ Pos=26|Len=5
7 are _ AUX _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 10 cop _ Pos=32|Len=3
8 at _ ADP _ _ 10 case _ Pos=36|Len=2
9 Inverleith _ PROPN _ Number=Sing 10 compound _ Pos=39|Len=10
10 House _ PROPN _ Number=Sing 0 root _ Pos=50|Len=5
11 , _ PUNCT _ _ 10 punct _ Pos=56|Len=1|SpaceAfter=No
12 Edinburgh _ PROPN _ Number=Sing 10 appos _ Pos=57|Len=9
13 , _ PUNCT _ _ 10 punct _ Pos=67|Len=1|SpaceAfter=No
14 open _ ADJ _ Degree=Pos 10 amod _ Pos=68|Len=4
15 from _ ADP _ _ 16 case _ Pos=73|Len=4
16 tomorrow _ NOUN _ Number=Sing 14 obl _ Pos=78|Len=8
17 daily _ ADJ _ Degree=Pos 16 amod _ Pos=87|Len=5
18 11 _ NUM _ NumType=Card 19 nummod _ Pos=93|Len=2
19 am _ NOUN _ Number=Sing 14 obl _ Pos=96|Len=2|SpaceAfter=No
20 - _ SYM _ _ 22 case _ Pos=98|Len=1|SpaceAfter=No
21 3.30 _ NUM _ NumType=Card 22 nummod _ Pos=99|Len=4|SpaceAfter=No
22 pm _ NOUN _ Number=Sing 17 nmod _ Pos=103|Len=2|SpaceAfter=No
23 until _ ADP _ _ 24 case _ Pos=105|Len=5
24 January _ PROPN _ Number=Sing 14 obl _ Pos=111|Len=7
25 29 _ NUM _ NumType=Card 24 nummod _ Pos=119|Len=2
26 . _ PUNCT _ _ 10 punct _ Pos=122|Len=1
Analyzing 2/2 (100.00%) 'test-eng12.txt'# sent_id = 1
# text = The Airbus A380 is the largest airplane in the world
: LP::Dumper : 2024-05-15T16:10:41.509 ERROR 0x60e595ca8080 ConllDumper::process target 15 not found in segmentation mapping
1 The _ PUNCT _ _ 0 punct _ Pos=1|Len=3|SpaceAfter=No
: LP::Dumper : 2024-05-15T16:10:41.509 ERROR 0x60e595ca8080 ConllDumper::process target 15 not found in segmentation mapping
2 Airbus _ PROPN _ Number=Sing 0 nsubj _ Pos=4|Len=6
3 A _ CCONJ _ _ 6 cc _ Pos=11|Len=1
4 380 _ PROPN _ VerbForm=Ger 5 amod _ Pos=13|Len=3|SpaceAfter=No
5 is _ PROPN _ Number=Sing 6 compound _ Pos=16|Len=2
6 the _ PROPN _ Number=Plur 10 nsubj _ Pos=19|Len=3
7 largest _ AUX _ Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin 10 cop _ Pos=23|Len=7
8 airplane _ ADP _ _ 10 case _ Pos=31|Len=8
9 in _ PROPN _ Number=Sing 10 compound _ Pos=40|Len=2
10 the _ PROPN _ Number=Sing 0 root _ Pos=43|Len=3
11 world _ PUNCT _ _ 10 punct _ Pos=47|Len=5
12 . _ PROPN _ Number=Sing 10 appos _ Pos=53|Len=1|SpaceAfter=No
# sent_id = 2
# text = It is used by Air France and Japan Airlines
: LP::Dumper : 2024-05-15T16:10:41.511 ERROR 0x60e595ca8080 ConllDumper::process target 11 not found in segmentation mapping
1 It _ PUNCT _ _ 0 punct _ Pos=54|Len=2
: LP::Dumper : 2024-05-15T16:10:41.511 ERROR 0x60e595ca8080 ConllDumper::process target 11 not found in segmentation mapping
2 is _ ADJ _ Degree=Pos 0 amod _ Pos=57|Len=2
3 used _ ADP _ _ 4 case _ Pos=60|Len=4
4 by _ NOUN _ Number=Sing 2 obl _ Pos=65|Len=2
5 Air _ ADJ _ Degree=Pos 4 amod _ Pos=68|Len=3
6 France _ NUM _ NumType=Card 7 nummod _ Pos=72|Len=6
7 and _ NOUN _ Number=Sing 2 obl _ Pos=79|Len=3
8 Japan _ SYM _ _ 10 case _ Pos=83|Len=5
9 Airlines _ NUM _ NumType=Card 10 nummod _ Pos=89|Len=8
10 . _ NOUN _ Number=Sing 5 nmod _ Pos=98|Len=1
Describe the bug When using Lima with the deeplima backend, the PoS tagging is correct if you analyze one file but the tags from this file are reused for those following.
To Reproduce Steps to reproduce the behavior:
Expected behavior All files PoS tags should be correct
Screenshots