edgeryders / discourse-annotator

A text annotation and analysis application for Discourse. Made with Annotator.js and Ruby on Rails.
https://edgeryders.eu/t/6811
Other
4 stars 0 forks source link

One-off: copy some posts and all their annotations into new posts #207

Closed albertocottica closed 3 years ago

albertocottica commented 3 years ago

This is a question mostly for @damingo.

In POPREBEL, there are now some interviews that have been uploaded onto Discourse as very long posts, and coded. This generates a problem when inducing the codes-co-occurrence networks, because an interview transcription might have 200 codes, which results in dense, insufficiently granular networks. We have decided to split each of these long posts into several shorter ones.

We could do this hitting the "reply" button below the original post, then cutting and pasting a part of the original post into the reply. But this would destroy the annotations. So, my question is this: given a post ID, could you duplicate that post as a reply to the topic of that post, including all its annotations? This would generate a topic consisting of n identical posts, all with identical annotations. The ethnographer will then manually delete from each post the unwanted parts, and the relative annotations.

This is a one-off, not a feature request.

While I am at it, a related question: if I create an annotation that uses a certain snippet of text for a post, and then later I delete that text, do I also delete the annotation?

tanius commented 3 years ago

given a post ID, could you duplicate that post as a reply to the topic of that post, including all its annotations? This would generate a topic consisting of n identical posts, all with identical annotations. The ethnographer will then manually delete from each post the unwanted parts, and the relative annotations.

This method would not yet achieve what you want. Because there's an issue with annotation positions not being adapted automatically yet if somebody edits the text after the coding happened. Annotations have two parallel anchoring mechanisms: character position, and a quote. Correcting the character position from the quote is possible, automatically in most cases, and this is how we have done it when importing the annotations from Drupal to Discourse. But it does not (yet) happen automatically by ethnographers deleting unwanted parts.

I'm trying to think of a simple solution for this use case …

tanius commented 3 years ago

After some thinking, the following is the best process I could come up with:

  1. Ethnographers split the interview topic into multiple posts as outlined by Alberto, but without deleting anything from the original first post that so far contains the whole interview. Most of the interview text will then appear twice in the topic.

  2. @damingo develops a script that will re-anchor annotations appearing in the first post to text in later posts if possible. That would can be done by looking to match the last part(s) of the XPath anchor, as typically whole paragraphs are moved to new posts. Where this fails, the "quote" anchor would be used, with the additional knowledge that the right quote is the first option immediately behind the previously processed annotation.

  3. We run that script on these topics, then delete the duplicate part from the first posts of these topics.

And because that script is not the simplest thing to develop … how many interviews are we talking about? If it's 4-5, then re-coding manually is certainly faster.

albertocottica commented 3 years ago

Thanks, noted! @miahass will get back to you.

tanius commented 3 years ago

The ongoing discussion revealed that it's faster to re-code posts than to develop this script.

But we'll keep it as a feature proposal for the future in #209.