aolney / mofacts-automated-authoring

Automated content creation for MoFaCTS
Apache License 2.0
3 stars 1 forks source link

Item Tagging #18

Closed aolney closed 4 years ago

aolney commented 4 years ago

Primary purpose is subsetting items for concentrated practice. Applications under discussion:

By section requires marking sections at text extraction time, generating cloze, and then merging cloze with section data.

By importance requires subsetting at cloze generation time.

Because applications under discussion have distinct I/O requirements, the most general way to handle both appears to be:

MoFaCTS will then use tags to mark items by cluster.

aolney commented 4 years ago

@andrewtackett @wscarter Please see proposed API changes in this issue

aolney commented 4 years ago

closed with https://github.com/aolney/mofacts-automated-authoring/commit/364e2b9f0290e5bb2b003075a3bd50d0cdd2c015

Note that the API has been modified at cloze with a new field tags that is a list of string.

andrewtackett commented 4 years ago

@aolney As we discussed on Wednesday, if you could update the API to output in the following format:

{
             "cloze": "The osmoreceptor - ADH mechanism can reduce a
normal urine production of 1,500 __________ per day to about 500
milliliters per day when the body is dehydrated .",
             "itemId": -826828571,
             "clozeId": 1785220680,
             "correctResponse":
"milliliters~dep~pobj",
             "tags": {
                 "weight":7,
                 "chunk":3,
                 "grouping":"default",
                 "syntacticRole":"pobj",
                 "rootDistance":3,
                 "startDistance":13
             }
}

instead of:

{
             "cloze": "The osmoreceptor - ADH mechanism can reduce a
normal urine production of 1,500 __________ per day to about 500
milliliters per day when the body is dehydrated .",
             "itemId": -826828571,
             "clozeId": 1785220680,
             "correctResponse":
"milliliters~dep~pobj~@syntacticRole:pobj~@rootDistance:3~@startDistance:13",
             "tags": [
                 "weight:7",
                 "chunk:3",
                 "default:default"
             ]
         }

Note tags is an object, not an array, each item is a key/value pair, and values that are integers are passed as such and not strings. Also I'm not 100% on whether "grouping" is the right name for the third tag but something more descriptive than "default" is necessary.

aolney commented 4 years ago

This looks good to me but with a follow up suggestion on default.

Short version: I think we should just remove it

Long version: when the tags field had the semantics that each element defined a grouping, it made sense to have a grouping of all items as default. That way, you could have one function that defined possible grouping types by the prefix and subgroup identities by the suffix. In the case of default:default, there would be a group type that contained all items. I recognized that this was an inefficient use of storage, but it made the semantics of a single group explicit in the data rather than an implicit special case covered by hidden code. However, if we are changing the semantics of tags to include additional, non grouping information (which is fine with me), then the semantics of default are no longer clear and I'm not sure there is much of an advantage to having it. If you'd like to keep it, then I would propose a renaming:

and we can make defaultGroup have value 1 for all items since the others are numeric.

And really, we could do this renaming whether you want to keep default or not

aolney commented 4 years ago

Went ahead with the renaming I proposed and got rid of default.

Closed with https://github.com/aolney/mofacts-automated-authoring/commit/a084c972d1cdef050055b11efba715c148d06513