edgeryders / discourse-annotator

A text annotation and analysis application for Discourse. Made with Annotator.js and Ruby on Rails.
https://edgeryders.eu/t/6811
Other
4 stars 0 forks source link

Allow multiple parents for one child #162

Closed miahass closed 4 years ago

miahass commented 4 years ago

If not possible (I remember Hugi mentioning something a while back about this not being doable in Discourse), just let me know and we will figure out a workaround!

miahass commented 4 years ago

short-term fix (this is just to document the interim process for ethnographers, not for anyone working on the development to implement formally): before each instance of the SSNA is generated, duplicate the code and nest one under a different parent. Merge after the SSNA is created, then copy again in the next SSNA generation. This is relevant especially if we need to assign all annotations of the child to the parent.

tanius commented 4 years ago

It's technically possible because our data structures are independent of the Discourse core, but: (1) this is a lot of effort and beyond the current budget because it would mean to replace the current core data structure for codes and because there seems to be no library support in Ruby gems for the new data structure required; (2) it will necessarily make the user interface complex and confusing; (3) it will almost certainly not help with conceptual modelling but hinder.

About the third point: true hierarchies (= one parent per node) are great because they help the human mind to govern by going up one abstraction level. With multiple parents per code, there is no single "up" direction anymore and users lose the spatial sense of "where" a code is, making it harder to navigate the parent-child graph to find something.

What you can do already is to add multiple parents to annotations, by annotating the same text with multiple codes. Parent codes, on the other hand, were only meant to help with concept granularity and code re-use (by re-structuring the code hierarchy later), not with other aspects of conceptual modelling.

To discuss this further: Could you give me an example of where multiple parents per code are necessary and this cannot be solved by having multiple parents per annotation (means, multiple codes assigned to the same text)? We can then think of suitable ways to support this usecase (e.g. declaring codes synonymous to others while keeping it a hierarchy). The use case should argue in terms of the analysis that has to be possible with the coded content, not in terms of more freedoms of conceptual modelling for ethnographers. Because for the latter, less freedom equates more order and forces ethnographers to think concepts through better in order to fit everything into a good hierarchy; which is a good thing, because hierarchies are simpler to use in the user interface than more generic parent-child graphs.

miahass commented 4 years ago

I personally agree with you about the messiness of having multiple parents for one child in terms of organisation. But let me explain what Jan wants to be able to do, and maybe we can figure out a way to help him without doing the multiple parents thing. He wants a code like "the Pope" to both be under the parent "Catholic Church" and under "Influential Figureheads" or something like that -- so that all annotations with "the Pope" assigned would also be auto-assigned to "Catholic Church" and to "Influential Figureheads."

miahass commented 4 years ago

Would it be possible to have a multi-level hierarchy? This is not an answer to Jan's issue in all cases, but might be in some.

tanius commented 4 years ago

Would it be possible to have a multi-level hierarchy?

It's already possible. For example, code "cooperative living" has parent path "alternative approaches → creative living arrangements", which means it is on the third level of the hierarchy.

He wants a code like "the Pope" to both be under the parent "Catholic Church" and under "Influential Figureheads" or something like that

That would make "the Pope" effectively a compound code, resulting in a synthetic co-occurrence of "Catholic Church" and "Influential Figureheads". That is discouraged in the Ethnographic Coding Wiki in section "Avoid compound codes" because it prevents co-occurrences to be an emergent property detected during the analysis.

So within the current framework, the ethnographer should code the same text mentioning "the Pope" with both "Catholic Church" and "Influential Figureheads", without creating a tag "the Pope". It seems that the desire to automate this work away is because coding with multiple codes manually is quite a lot of manual work right now. That however will be solved in #60 (assigning multiple codes a once).

Does that remove the need for multi-parent concept hierarchies then?

If there is still a need to synthetically create co-occurences of codes based on existing annotations, then I'd propose the following workflow using the existing functionality (told with the example from above):

  1. After coding is finished, copy code "Pope" to "Pope (COPY)".

  2. Merge code "Pope (COPY)" into "Influential Figureheads". After this, everything originally coded "Pope" will now also be coded "Influential Figureheads".

We could introduce a similar mechanism that works on annotations rather than codes, in order to quickly select some existing annotations from a filtered list and create new ones with the same quote text and a new code. Compared to the process above, that would allow more fine-grained control over what to code with additional codes. For example, not every single experience with "the pope" is necessarily an experience with an influential figurehead.

albertocottica commented 4 years ago

If I may, @miahass, I suggest hierarchies are used sparingly. Reason: we want to make annotations and codes re-usable, and hierarchies tend to reflect the mental models of the individual researcher or group thereof. The exception are the near-unassailable hierarchies, like "Spain" is a child of "Europe", "cat" of "mammal", "TCP/IP" of "Internet protocols". The multiple tagging is far better.

In Jan's example, you would probably start from coding something "The Pope", something else with "The President" and something else again with "The Clergy". If "The Pope" co-occurs a lot with "The President", then you could start to make inference about the role of authority figures without having a specialized code: it just emerges from the linking. Likewise, if "The Pope" co-occurs a lot with "the clergy", you know you are looking at something related to the church.

But of course I see what you mean. Maybe we can invoke some math to do these aggregation via topology rather than semantics... this is a great theme for Masters of Networks, by the way.

miahass commented 4 years ago

Agree. I’ve been working with them on the concept of hierarchies in general and have been trying to stress the difference between hierarchies (as the strict entity that @tanius describes well above) and categories (which are looser organising principles that an ethnographer can use freely to think through code relationships on their own). Upon reflection, I’m inclined to agree with both of you and take this as a further example of the need to distinguish between the two, especially for the purposes of more rigorous SSNA thinking. Let’s stick with one parent per child and yes, Matt, the assigning multiple codes in one go will make your suggested solution of assigning multiple codes much, much easier and remove a lot of desire for this work being done via hierarchy. Because, in fact, the pope really should only be co-coded with influential figurehead if in that particular contribution, he is being referred to as one. The association should not be automatic or hierarchical.

albertocottica commented 4 years ago

Let’s stick with one parent per child

You mean: one parent, multiple children.

miahass commented 4 years ago

Yes. what I meant was — each child can only have one parent. A parent of course can have many children.

miahass commented 4 years ago

We could introduce a similar mechanism that works on annotations rather than codes, in order to quickly select some existing annotations from a filtered list and create new ones with the same quote text and a new code. Compared to the process above, that would allow more fine-grained control over what to code with additional codes. For example, not every single experience with "the pope" is necessarily an experience with an influential figurehead.

Yes, this would be incredibly useful.

tanius commented 4 years ago

Ok great, so the "multiple parents for one child" idea is not to be implemented. Phew … because that is one of these things that would have been really tough to impossible to fit in …

I suggest hierarchies are used sparingly. Reason: we want to make annotations and codes re-usable, and hierarchies tend to reflect the mental models of the individual researcher or group thereof.

The code hierarchies as currently implemented can (and are intended to) facilitate code re-use. For that, ethnographers would only use the leaf-level codes in their coding practice. Only these would be created during manual coding and considered to reflect "reality on the ground". The hierarchies are then freely re-arranged in a new project, indeed reflecting the mental models of researchers. But that modeling is ok because it can be modified at any time with ease, without having to wade through all of the posts again.

This process is not perfect though, because by re-arranging the hierarchy one researcher will build up their mental model while destroying that of a researcher from a previous project that has been concluded. But for simplicity, that's what we have right now.

Also note that hierarchies can be used solely to find codes faster in the Open Ethnographer auto-completion, in which case the hierarchy relationships would have no semantic meaning of their own and would not be evaluated when calculating co-occurrences. Changing the hierarchy later will not change the co-occurrences or other results of the analysis.

This is different from using them for a concept hierarchy, as @miahass wants to do (see the discussion in #167). In that case, re-arranging the code hierarchy changes the calculated co-occurrences.

Whatever semantics you use for the code hierarchy relations in your project, it's only important to document that and be consistent about it within one ethnographic project. (Or rather, in the whole Open Ethnographer installation, in case some topics appear in multiple ethnographic projects).

Closing this issue because the original goal is no longer desired. Feel free to continue the discussion about using hierarchies on our Discourse platform.