dhowe / ramble

Ramble v2.0
0 stars 3 forks source link

Incorrect parts-of-speech for words in text #80

Closed dhowe closed 2 years ago

dhowe commented 2 years ago

@shadoof I've noticed some rather strange similars in the cache, and, on inspection, found that the POS for the original word was wrong (which wreaks much havoc). Can you recheck parts-of-speech for the replaceable words ? I've created a table in the 2nd comment below. You can see the edits I've made in the code here.

An example:

Index Rural Urban Current POS Correct POS
22 spreads spreads nns VBZ
66 circadian violent nn JJ

rural

by the time the light has faded, as the last of the reddish gold illumination comes to rest, then imperceptibly spreads out over the moss and floor of the woods on the westerly facing lakeside slopes, you or I will have set out on several of yet more circuits at every time and in all directions, before or after this or that circadian, usually diurnal, event on mildly rambling familiar walks, as if these exertions might be journeys of adventure whereas always our gestures, guided by paths, are also more like traces of universal daily ritual: just before or with the dawn, after a morning dip, in anticipation of breakfast, whenever the fish are still biting, as and when the industrious creatures are building their nests and shelters, after our own trials of work, while the birds still sing, in quiet moments after lunch, most particularly after dinner, at sunset, to escape, to avoid being found, to seem to be lost right here in this place where you or I have always wanted to be and where we might sometimes now or then have discovered some singular hidden beauty, or one another, or stumbled and injured ourselves beyond the hearing and call of other voices, or met with other danger, animal or inhuman, the one tearing and rending and opening up the darkness within us to bleed, yet we suppress any sound that might have expressed the terror and passion and horror and pain so that I or you may continue on this ramble, this before or after walk, and still return; or the other, the quiet evacuation of the light, the way, as we have kept on walking, it falls on us and removes us from existence since in any case we are all but never there, always merely passing through and by and over the moss, under the limbs of the evergreens, beside the lake, within the sound of its lapping waves, annihilated, gone, quite gone, now simply gone and, in being or walking in these ways, giving up all living light for settled, hearth held fire in its place, returned

urban

by the time the light has faded, as the last of the reddish gold illumination comes to rest, then imperceptibly spreads out over the dust and rubble of the craters on the easterly facing bankside heights, you or I will have rushed out on several of yet more circuits at every time and in all directions, before or after this or that violent, usually nocturnal, event on desperately hurried unfamiliar flights, as if these panics might be movements of desire whereas always our gestures, constrained by obstacles, are also more like scars of universal daily terror: just before or with the dawn, after a morning prayer, in anticipation of hunger, while the neighbors are still breathing, as and when the diligent authorities are marshaling their cronies and thugs, after our own trials of loss, while the mortars still fall, in quiet moments after shock, most particularly after curfew, at sunset, to escape, to avoid being found, to seem to be lost right here in this place where you or I have always wanted to be and where we might sometimes now or then have discovered some singular hidden beauty, or one another, or stumbled and injured ourselves beyond the hearing and call of other voices, or met with other danger, venal or military, the one tearing and rending and opening up the darkness within us to bleed, yet we suppress any sound that might have expressed the terror and longing and horror and pain so that I or you may continue on this expedition, this before or after assault, and still return; or the other, the quiet evacuation of the light, the way, as we have kept on struggling, it falls on us and removes us from existence since in any case we are all but never there, always merely passing through and by and over the dust, within the shadows of our ruins, beneath the wall, within the razor of its coiled wire, annihilated, gone, quite gone, now simply gone and, in being or advancing in these ways, giving up all living light for unsettled, heart felt fire in our veins, exiled

dhowe commented 2 years ago
Index Rural Urban Current POS Correct POS
2 time time nn
4 light light jj NN
6 faded faded vbn
10 last last jj NN
13 reddish reddish jj
14 gold gold jj
15 illumination illumination nn
16 comes comes vbz
18 rest rest nn
21 imperceptibly imperceptibly rb
22 spreads spreads nns VBZ
26 moss dust nn
28 floor rubble nn
31 woods craters nns
34 westerly easterly rb
35 facing facing vbg
36 lakeside bankside nn JJ
37 slopes heights vbz NNS
42 will will md
44 set rushed vbn
47 several several jj
50 more more jjr
51 circuits circuits nns
54 time time nn
58 directions directions nns
60 before before in
66 circadian violent nn JJ
69 diurnal nocturnal jj
71 event event nn
73 mildly desperately rb
74 rambling hurried jj
75 familiar unfamiliar jj
76 walks flights nns
81 exertions panics nns
82 might might md
84 journeys movements nns
86 adventure desire nn
88 always always rb
90 gestures gestures nns
92 guided constrained vbn
94 paths obstacles nns
98 more more jjr
99 like like vb
100 traces scars nns
102 universal universal jj
103 daily daily rb JJ
104 ritual terror jj NN
107 before before in
111 dawn dawn nn
115 morning morning nn JJ
116 dip prayer nn
119 anticipation anticipation nn
121 breakfast hunger nn
125 fish neighbors nns
127 still still rb
128 biting breathing vbg
134 industrious diligent jj
135 creatures authorities nns
137 building marshaling vbg
139 nests cronies nns
141 shelters thugs vbz NNS
146 trials trials nns
148 work loss nn
152 birds mortars nns
153 still still rb
154 sing fall vb
157 quiet quiet jj
158 moments moments nns
160 lunch shock nn
162 most most rbs
163 particularly particularly rb
165 dinner curfew nn
168 sunset sunset nn
171 escape escape vb
174 avoid avoid vb
175 being being vbg
176 found found vbd VBN
179 seem seem vb
182 lost lost vbd JJ
183 right right jj RBR
184 here here rb
187 place place nn
193 always always rb
194 wanted wanted vbd
200 might might md
201 sometimes sometimes rb
206 discovered discovered vbn
208 singular singular jj
209 hidden hidden vbn JJ
210 beauty beauty nn
214 another another dt
217 stumbled stumbled vbd VBN
219 injured injured vbn VBN
(check - I think was already vbn)
220 ourselves ourselves prp
221 beyond beyond in
223 hearing hearing vbg (used as noun; but vbg should cover this here?)
225 call call vb NN
228 voices voices nns
234 danger danger nn
236 animal venal jj
238 inhuman military jj
242 tearing tearing vbg
244 rending rending nn VBG
246 opening opening vbg
249 darkness darkness nn
253 bleed bleed vb
257 suppress suppress vbp VB
(not singular, so use VB?)
259 sound sound jj NN
261 might might md
263 expressed expressed vbn
265 terror terror nn
267 passion longing nn
269 horror horror nn
271 pain pain nn
278 continue continue vb
281 ramble expedition nn
284 before before in
287 walk assault vb NN
290 still still rb
291 return return jj VB
(not singular, so use VB?)
298 quiet quiet jj
299 evacuation evacuation nn
302 light light jj NN
310 kept kept vbd VBN
312 walking struggling vbg
315 falls falls vbz
319 removes removes vbz
322 existence existence nn
323 since since in
326 case case nn
331 never never rb
334 always always rb
335 merely merely rb
336 passing passing vbg
337 through through in
343 moss dust nn
347 limbs shadows nns
350 evergreens ruins nns
352 beside beneath in
354 lake wall nn
358 sound razor jj NN
361 lapping coiled nn JJ
362 waves wire vbz NNS
(sic, let's think of this wire as plural)
364 annihilated annihilated vbd
366 gone gone vbn
368 quite quite rb
369 gone gone vbn
372 simply simply rb
373 gone gone vbn
377 being being vbg
379 walking advancing vbg
382 ways ways nns
384 giving giving vbg
387 living living vbg
388 light light jj NN
390 settled unsettled vbd JJ or VBN
(use JJ when it's a vbn used in this way?)
392 hearth heart nn
393 held felt vbn
394 fire fire nn
397 place veins nn
399 returned exiled vbd
shadoof commented 2 years ago

This was such an important catch! I can't believe we didn't spot this as likely to happen and then also as happening. (When the pos tagging was 'live' I didn't really consider checking it and, of course, it also brings up the question of how 'contextual' the RiTa tagger is.)

I hope editing your table was the right thing to do at this stage.

shadoof commented 2 years ago

And I realize that this may not be something for this project: but does RiTa.pos() or something related allow you to return a list of all possible POS's for single words? e.b. "fight" which can mos def be either NN or VB (and just as frequently).

dhowe commented 2 years ago

I hope editing your table was the right thing to do at this stage.

That's what it was there for...

And I realize that this may not be something for this project: but does RiTa.pos() or something related allow you to return a list of all possible POS's for single words? e.b. "fight" which can mos def be either NN or VB (and just as frequently).

RiTa.tagger.allTags()

shadoof commented 2 years ago

That's what it was there for...

I assume you have a neat little script to get the corrected tags back into the code? And I'm curious to know what you used you to generate the markdown table from the data?

shadoof commented 2 years ago

RiTa.tagger.allTags()

So, when you pass a pos to something like RiTa.rhymes() does it use allTags of the candidates?

dhowe commented 2 years ago

And I'm curious to know what you used you to generate the markdown table from the data?

I used a funny language called JavaScript and logged it to the browser console:

let s = ''; // github pos table
repids.forEach(r => s+=`| ${r} | ${sources.rural[r]} | ${sources.urban[r]} | ${sources.pos[r]} |  |\n`);
console.log(s);

Or using reduce()

console.log(repids.reduce((str, r) => str + 
  `| ${r} | ${sources.rural[r]} | ${sources.urban[r]} | ${sources.pos[r]} |  |\n`), '');
dhowe commented 2 years ago

So, when you pass a pos to something like RiTa.rhymes() does it use allTags of the candidates?

Yes, it includes all possible tags for each candidate, though you can add the 'strictPos: true' option to only consider the first (most common) tag