edgeryders / discourse-annotator

A text annotation and analysis application for Discourse. Made with Annotator.js and Ruby on Rails.
https://edgeryders.eu/t/6811
Other
4 stars 0 forks source link

Provide a "Projects" database entity for whole SSNA studies #98

Closed albertocottica closed 2 years ago

albertocottica commented 5 years ago

Based on this thread.

The idea is this. For reasons of consistency (during a research project) and accountability (afterwards), we have decided to produce a codebook as an open-and-collaborative document. The codebook should be produced automatically (see #119), and consists of the information on the codes used in a specific study.

Normally, we identify a SSNA project via a Discourse tag with the format ethno-PROJECTNAME. The workflow is this:

  1. Topics on the platform are assigned the tag ethno-PROJECTNAME.
  2. Ethnographers go to work on those projects and annotate them.
  3. Subsequent analysis (codebook refinement, SSNA proper) is run on tagged topics and the annotations thereon.

The problem with this is that the same topic can be part of more than one study. In this case, you have inconsistencies, as annotations from different groups, made with different goals are pulled into the same bags.

The solution is to create a "project" entity. It could consist of:

Annotations would then include a field referring back to the project ID. This would disambiguate between annotations made for different studies on the same content.

Graphryder instances would no longer accept Discourse tags as arguments, but rather project IDs. From those, the associated Discourse tags would pull in participants and topics. Annotations would be pulled in on the basis of bearing the project's ID.

We should probably pre-fill the project field in the Annotation with the last ID used, since ethnographers tend to work on the same project for a long time.

tanius commented 5 years ago

The solution is to create a "project" entity. It could consist of: […] a Discourse tag of the form ethno-PROJECTNAME to identify the relevant content

Since you propose to replace the ethno-PROJECTNAME tags with a field in annotations, the tag would not be a part of the project definition. Otherwise we'd have redundant and possibly inconsistent data: a topic could not have the ethno-PROJECTNAME tag but annotations in it could claim to belong to that project.

Annotations would then include a field referring back to the project ID. This would disambiguate between annotations made for different studies on the same content.

But then what if you want an annotation to belong to multiple projects? That would require a multi-value field. And with that it starts to look over-engineered to me …

I think before deciding something here we need to step back, remember that Open Ethnographer was meant to make ethnographer work re-usable, and find a good way to make annotations re-usable. With projects as proposed so far, re-use would not happen much in practice.

albertocottica commented 5 years ago

Since you propose to replace the ethno-PROJECTNAME tags with a field in annotations, the tag would not be a part of the project definition. Otherwise we'd have redundant and possibly inconsistent data: a topic could not have the ethno-PROJECTNAME tag but annotations in it could claim to belong to that project.

Correct.

I think before deciding something here we need to step back, remember that Open Ethnographer was meant to make ethnographer work re-usable, and find a good way to make annotations re-usable. With projects as proposed so far, re-use would not happen much in practice.

Also correct. Let's discuss this with the ethnographers.

tanius commented 4 years ago

I just thought that the following could be a useful implementation for this:

We'd keep the definitions of coding projects like they are now (topics tagged with a certain Discourse tag). All annotations found in these topics are considered "in principle valuable", in line with the spirit of re-using ethnographic work.

However, that does not mean that they are valuable for all analyses and data presentations. So we would also have:

tanius commented 4 years ago

I split off everything relating to the automatic creation of codebooks into its own issue now (#119).

tanius commented 2 years ago

Closing as duplicate of #222 , which is a newer attempt at solving the same underlying data architecture issue.