cwrc / CWRC-WriterBase

The base class from which to create a CWRC-Writer XML editor.
GNU General Public License v2.0
14 stars 3 forks source link

Annotations' attributes get overwritten by CWRC-Writer #282

Open lucaju opened 4 years ago

lucaju commented 4 years ago

RDF annotations attributes get overwritten by CWRC-Writer every time it rebuilds the XML. More specifically, the annotation ID (which includes de URL), the date when the annotation was created and issued get updated to the current timestamp, and the creator's information is replaced with the current user's data. If the file is saved, previous information is lost.

For instance:

Also, some of the metadata stored within the annotation is not up-to-date or incorrect. For instance,

Expected Behaviour

Unless the user deliberately changes an annotation, CWRC-Writer should preserve annotation's attributes, including the ID, the dates and the creator.

Current Behaviour

Previous information gets replaced when the file is saved,

Possible Solution

Rework the process of generating annotations. Currently, it seems that some attributes (date, creator, etc) are not stored as objects when the file is loaded. Storing these attributes and use them to check if there was any change to the annotation prior to regenerate the XML might solve the problem.

Since this is an important and sensitive feature, these changes should be discussed and agreed upon before implemented.

Steps to Reproduce (for bugs)

  1. Load Sample Letter from the template.
  2. Save the document on your own repository.
  3. Check how the annotation got saved and comparate with the original file
lucaju commented 4 years ago

The attribute appVersion will be set to read the current version from package.json. The attribute as:generator.id will be set to read from window.location.origin, which is the host (and protocol) from where CWRC-Writer is been used. e.g., https://cwrc-writer.cwrc.ca.

lucaju commented 4 years ago

A few thought about the dates attributes. we might want to include dcterms:modified to store when the annotation was updated for the last time.

We might want to rethink the use of dcterms:issued. It seems that it should describe when a document was formally issued. They give the following example: "A government file, officially released in 1997, consisting of photographs taken in 1985 of hundreds of meteorite fragments collected in 1952 could be described with the following metadata: DC.Date->Issued: 1997 DC.Date->Created: 1985 DC.Date->DataGathered: 1952" (https://www.dublincore.org/specifications/dublin-core/date-element/)

Since we are dealing with annotation metadata (as opposed to the content of the annotation), the issue date is the same as the created date, which can be either the moment when the user creates the annotation or when the file is saved (committed). Alternatively, we can use this pair of attributes (created & issued) to describe the above process (creation and save). Not sure if it would make any difference, though.

lucaju commented 4 years ago

Currently, when loading a file, CWRC-Writer parses (and remove) RDF annotation from the XML, and stores them in JS objects (Entities). These objects are used to manage entities, edit, and display entities. Later on, when the user decides to save (or to check the XML), these objects are used to regenerate JSON-LD and put them back in the XML (inside XENODATA).

The problem is that when initially parsing the RDF, CWRC-Writer only stores part of the attributes, which does not include creator, for instance. Them, when regenerating the annotation, the creator becomes the current user. The date of creation is also a problem since the process of regeneration updates the annotation's creation timestamp. In the end, the annotation gets overwritten with the information of the latest user that saved the file and we lost the original date and creator attributes.

The solution to this issue might be as trivial as storing these crucial attributes to be used to regenerate the annotation later. while this works fine when the annotation doesn't change in the session, how we will handle updates? Should we update the date of creation's timestamp? Or perhaps add a new attribute to store the latest updated (dcterms:modified)? What if the annotations got changed multiple times? And What to do with the creator? Keep the original creator? replace? add an array of contributors?

Perhaps @SusanBrown can help with these questions.

lucaju commented 4 years ago

I'm not well versed in JSON-LD neither how people used, but perhaps there is more to this. Perhaps, a bigger question (at least for me at this point) is how CWRC-WRITER supports JSON-LD and how it handles it? What attributes (and from which namespaces) we are using?

Anatomy of a CWRC-Writer JSON-LD. So, here an example of one annotation saved by CWRC-Writer:

{
    "@context": {
        "as": "http://www.w3.org/ns/activitystreams#",
        "cwrc": "http://sparql.cwrc.ca/ontologies/cwrc#",
        "dc": "http://purl.org/dc/elements/1.1/",
        "dcterms": "http://purl.org/dc/terms/",
        "foaf": "http://xmlns.com/foaf/0.1/",
        "geo": "http://www.geonames.org/ontology#",
        "oa": "http://www.w3.org/ns/oa#",
        "schema": "http://schema.org/",
        "xsd": "http://www.w3.org/2001/XMLSchema#",
        "dcterms:created": {
            "@type": "xsd:dateTime",
            "@id": "dcterms:created"
        },
        "dcterms:issued": {
            "@type": "xsd:dateTime",
            "@id": "dcterms:issued"
        },
        "oa:motivatedBy": {
            "@type": "oa:Motivation"
        },
        "@language": "en"
    },
    "@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101",
    "@type": "oa:Annotation",
    "dcterms:created": "2019-08-14T20:41:01.124Z",
    "dcterms:issued": "2019-08-14T20:44:01.985Z",
    "dcterms:creator": {
        "@id": "https://github.com/ilovan",
        "@type": [
            "cwrc:NaturalPerson",
            "schema:Person"
        ],
        "cwrc:hasName": "Mihaela Ilovan",
        "foaf:nick": "ilovan"
    },
    "oa:motivatedBy": "oa:editing",
    "oa:hasTarget": {
        "@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101#Target",
        "@type": "oa:SpecificResource",
        "oa:hasSource": {
            "@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter",
            "@type": "dctypes:Text",
            "dc:format": "text/xml"
        },
        "oa:renderedVia": {
            "@id": "https://cwrc-writer.cwrc.ca/",
            "@type": "as:Application",
            "rdfs:label": "CWRC Writer",
            "schema:softwareVersion": "1.0"
        },
        "oa:hasSelector": {
            "@id": "https://raw.githubusercontent.com/ilovan/Git-Writer-tests/master/templates/sample_letter?correction_annotation_20190814144101#Selector",
            "@type": "oa:XPathSelector",
            "rdf:value": "TEI/text/body/div/p[2]/choice"
        }
    },
    "oa:hasBody": {
        "@type": "fabio:Correction",
        "dc:format": "text/xml",
        "rdf:value": "when"
    },
    "as:generator": {
        "@id": "https://cwrc-writer.cwrc.ca/",
        "@type": "as:Application",
        "rdfs:label": "CWRC Writer",
        "schema:url": "https://cwrc-writer.cwrc.ca",
        "schema:softwareVersion": "1.0"
    }
}
lucaju commented 4 years ago

Made some more adjustments and updates (9e3c0cfd0b5c949aaf09726223309e28ea17572e):

lucaju commented 4 years ago

A few questions for consideration (@SusanBrown @ilovan )

1. What constitutes, or what should trigger an update to an annotation?

Perhaps a better question is What counts toward modifying an annotation? (as opposed to contextual updates)

2. Is it ok to have an array of contributors?

DCMI defines the term contributor as "an entity responsible for making contributions to the resource. The guidelines for using names of persons or organizations as creators apply to contributors."

I decided to use it to add users that modify the annotation. Every time a user deliberately modifies an annotation, the modified date gets updated, and the user is added to the list of contributors (if not already, and if not the creator).

But it is not clear in the documentation if there can be more than one contributor. They say, though, that contributor (term) is a subproperty of contributor (element) (both in singular).

Example:

"dcterms:contributor": [
    {
        "dcterms:contributor": {
            "@id": "https://github.com/lucaju",
            "@type": [
                "cwrc:NaturalPerson",
                "schema:Persosn"
            ],
            "cwrc:hasName": "Luciano Frizzera",
            "foaf:nick": "lucaju"
        }
    },
        {
        "dcterms:contributor": {
            "@id": "https://github.com/sbrown",
            "@type": [
                "cwrc:NaturalPerson",
                "schema:Persosn"
            ],
            "cwrc:hasName": "Susan Brown",
            "foaf:nick": "sbrown"
        }
    }
]

3. On the as:generator attribute, what is the difference between @id and #schema:url?

Both point to the same place: the URL from which CWRC-Writer is been used (e.g., cwrc-writer.cwrc.ca).

"as:generator": {
    "@id": "https://cwrc-writer.cwrc.ca/",
    "@type": "as:Application",
    "rdfs:label": "CWRC Writer",
    "schema:url": "https://cwrc-writer.cwrc.ca",
    "schema:softwareVersion": "1.0"
}

4. Is there any particular order in which JSON-LD should be built?

I understand that @context should be put at the top of JSON-LD document. What about the other attributes? Is there a standard for that, or should we come up with our own order? Alphabetically? Logically (eg., created and modified next to each other)?