divvun / libdivvun

lib for running gramcheck and other pipelines + cli; modules for CG→spelling, CG→feedback, tagging blanks
https://giellalt.github.io/proof/gramcheck/GrammarCheckerDocumentation.html
GNU General Public License v3.0
9 stars 1 forks source link

recasing suggestions #42

Closed flammie closed 3 years ago

flammie commented 3 years ago

c.f. http://giellatekno.uit.no/bugzilla/show_bug.cgi?id=2712

unhammer commented 3 years ago

There's an inline function withCasing you can use that does this, which will also respect the <fixedcase> CG tag, e.g. withCasing(reading.fixedcase, inputCasing, formv)

But this change only regards the cg output format – does the original error also happen with json output? In that case, the fix needs to happen earlier in the processing.

flammie commented 3 years ago

There's an inline function withCasing you can use that does this, which will also respect the <fixedcase> CG tag, e.g. withCasing(reading.fixedcase, inputCasing, formv)

Ah ok it seems like a good idea.

But this change only regards the cg output format – does the original error also happen with json output? In that case, the fix needs to happen earlier in the processing.

You are probably right, I was not able to backtrack the code to find a common spot for this recasing, maybe you have an idea of where it could go?

unhammer commented 3 years ago

For json and lib usages, casing happens in mk_errs(sentence), after a whole sentence has been processed (because casing can be changed by relations to other cohorts). So this already worked in json mode:

$ echo 'Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali ektui ja čearddalašvuođa ektui.' | bash smegramj.mode |jq .
{                                                                                              
  "errs": [                                                                                    
    [                                                                                          
      "Álgoálbmot nissonat",                                                                   
      0,                                                                                       
      19,                                                                                      
      "msyn-compound",                                                                         
      "\"Álgoálbmot nissonat\" orru leamen goallossátni",                                      
      [                                                                                        
        "Álgoálbmotnissonat"                                                                   
      ],                                                                                       
      "Goallosteapmi"                                                                          
    ]                                                                                          
  ],                                                                                           
  "text": "Álgoálbmot nissonat vásihit dávjá máŋgga dimenšuvnnat vealaheami, sihke sohkabeali e
ktui ja čearddalašvuođa ektui.\n"                                                              
}                   

But the cg processing thing just prints out readings as they're processed, so it's a bit low-level. I suppose it might make sense to have the mk_errs(sentence) output after each full sentence in CG mode, so people don't have to run two different modes to see the actual suggestions.