OSCOSS / fiduswriter

This repository only contains the issue list relevant for the work the OSCOSS project is conducting on Fidus Writer. The Fidus Writer sources can be found at https://github.com/fiduswriter/fiduswriter .
1 stars 1 forks source link

Export FW to dokieli format #91

Closed afshinsadeghi closed 6 years ago

afshinsadeghi commented 7 years ago

https://github.com/linkeddata/dokieli

afshinsadeghi commented 7 years ago

Process:

  1. Export from FW
  2. Show in dokieli

    Learn :

    Structure of FW

Structure of dokieli

comments in dokeli have two parts.

  1. the marked section of text.

    Example:

demonstrating advanced document authoring and interaction without a single point of control💬

structure: span mark sup /span

tag: mark id : r-NUMBER property: "schema:description" text body

  1. the comment belonging this part of text.

Example:

Structure: aside class:"note do" blockquote tag:blockquote cite: URI article tag:article id: NUMBER typeof="oa:Annotation" prefix:"few URI including the one in cite" h3 class:shema-name dl class:author-name dl class:published dd a time /time /a /dd /dl dl class: rights dl class: canonical dl class: target dl class: renderedvia section id: h2 div dl /section /article /blockquote /aside

afshinsadeghi commented 7 years ago

To add the new code I created the fork of FW here: https://github.com/sadeghiafshin/fiduswriter @csarven @johanneswilm firstly I want to extend the FW "export to html" function. Does it sound logic?

csarven commented 7 years ago

Note that nodes with class "do" are due to being dynamically inserted into DOM. When dokieli does a write operation, it skips over those classes. There is more to the normalisation steps but that may not be necessary to bother with for the time being. For starters, only look at the source HTMLs.

afshinsadeghi commented 7 years ago

@csarven

  1. As far as I got, Dokieli is providing a way to write and publish as web page all written by javascript so that is why it is not centralized. Is that correct? (that is for my learning).

  2. I did not find in Dokieli an import function for a whole page but there exist "import" for JSON-LD and triples. What is that doing?

csarven commented 7 years ago
  1. Without overloading the terms, yes, to some extent that's true..
  2. At the moment, there is "Open", if that's what you are trying to achieve. I think we can work with this (and improve where necessary). Otherwise, there is no need to "import" as one would just navigate to the URL. Did I understand you correctly? The JSON-LD, Turtle, Nanopublications that you are referring to is to embed data i.e., to enter a block of data that will be injected into the DOM, and when the document is saved/exported, it will be in the HTML (in head).
afshinsadeghi commented 7 years ago

Thanks, then I tested "open". For that, I tried to "open" once a URL from the Internet(https://fiduswriter.gesis.org/document/536/) and an exported Dokieli web page on my computer(file:///Users/afshin/Downloads/dokieli.20170630T111241857Z.html) but it did not worked. Is this is a fully functional button? I could see iri value in the line 3241 of do.js includes a url of the webpage and getResource function on line 898 is called, but on line 907 the http.open('GET', url); seems not working and this.status on line 916 is empty.
What should be the result of "open" button in the end?

csarven commented 7 years ago

Possibly due to fiduswriter.gesis.org is not CORS enabled, so XHR status appears to be 0. Let me see if I can update Open a bit to route through a proxy.

Note that https://fiduswriter.gesis.org/document/536/ sends header Location: /account/login/?next=/document/536/, requiring authentication. Perhaps a token in the URL can be used to fetch (for read only)?

IIRC, file: is not yet supported. Only http

csarven commented 7 years ago

It'd be great if the domain is CORS enabled. At least along the lines of (an Apache configuration)

SetEnvIfNoCase ORIGIN (.*) ORIGIN=$1
Header set Access-Control-Allow-Origin "%{ORIGIN}e" env=ORIGIN
afshinsadeghi commented 7 years ago

I will update the apache server . ...

I could not! I am not sudo there. Maybe @johanneswilm can help there.

. Can it be because of "http" check? the FW URL had "httpS"

csarven commented 7 years ago

I don't think so. It appears to redirect from http to https, and then another redirect for the authentication page.

csarven commented 7 years ago

I meant that http(s) scheme is supported.

If there is a read-only publicly accessible URL for the article, we should be able to open that in dokieli. I've added proxy use into dokieli in any case, but the public read URL of the article is still needed. Right now I'm looking into https://github.com/linkeddata/dokieli/issues/198

afshinsadeghi commented 7 years ago

As a summary: I break this task into two steps.

Export HTML from FW and import to Dokieli. As we have import HTML in Dokieli but we have CORS problem to import, I install the last version of Dokieli on the dev server(https://fiduswriter-devel.gesis.org) and for the export of FW, I have to extend it to export the documents to include comments and titles etc. Right now I am going through this https://github.com/OSCOSS/fiduswriter/issues/92

Currently I imagine that the importing of an HTML document into dokieli works with "OPEN" and continue on the extending export html of FW part.

@csarven How does Dokieli support the proxy use? Should I setup the proxy functionality in a in my test server?

johanneswilm commented 7 years ago

CORS is generally a problem and AFAIK it's generally recommended not to mess with it. That's why Fidus Writer uses proxy views to download things from other web sites.

afshinsadeghi commented 7 years ago

So I only consider the case that both FW and Dokieli be on the same server.

johanneswilm commented 7 years ago

CORS is enforced by the user agent (the browser), so to get around this, one can have the server do whatevber needs to be done on the web instead through what we call a "proxy view".

As far as I understood, dokieli is simply a bucnh of files that can be put together in a zip file, right? Do we need to interact with a dokieli server at all? If not, then there shouldn't be any CORS issue.

johanneswilm commented 7 years ago

Why on the same server? This shouldn't matter, as long as the request to a different server (whereever the dokieli server is running) is done by the proxy call of the fidus writer server and not by the browser.

afshinsadeghi commented 7 years ago

Imho Although it runs on the browser, the browser will check the URL of both and will not let it happen due to CORS problem if they are on different servers.

csarven commented 7 years ago

I think both exporting FW article to HTML, and importing in dokieli can exist on their own.

For FW, if that particular export function is aligned with dokieli's HTML, the resulting HTML can be used independently of dokieli. Simply publish that as is because that article is "dokieli-ised" any way.

FW can also export an HTML that doesn't include dokieli's CSS and JS (the minimal that's in head in dokieli articles). In this case, that article can be imported ("Open") from dokieli, and as part of that process it will inject the CSS and JS. We can look into the details for the minimal HTML template.. generally it is along the lines below.

<section id="foo" rel="schema:hasPart" resource="#foo">
  <h2 property="schema:name">Foo</h2>
  <div datatype="rdf:HTML" property="schema:description">
    <!-- any HTML -->

    <!-- sub-sections -->
    <section id="bar" rel="schema:hasPart" resource="#bar">
      <h3 property="schema:name">Bar</h3>
      <div datatype="rdf:HTML" property="schema:description">
        <!-- any HTML -->

        <!-- sub-sub-sections -->
        <section id="baz" rel="schema:hasPart" resource="#baz">
          <h4 property="schema:name">Baz</h4>
          <div datatype="rdf:HTML" property="schema:description">
            <!-- any HTML -->

          </div>
        </section>

        <!-- any HTML -->
        <!-- "aside" is a good candidate here as the last node in this section -->
      </div>
      <!-- "aside" is a good candidate here as the last node in this section -->
    </section>

    <!-- any HTML -->

    <section id="qux" rel="schema:hasPart" resource="#qux">
      <h2 property="schema:name">Qux</h2>
      <div datatype="rdf:HTML" property="schema:description">
        <!-- any HTML -->
      </div>
    </section>

    <!-- any HTML -->
  </div>
  <!-- any HTML -->
</section>

The above HTML is not a strict rule. Normally any HTML is okay. The example above only brings some structure and semantics to sections and asides. aside is a good candidate as the last node of a node (section, div), typically used to place footnotes or annotations within.

It'd be nice to have CORS enabled on the FW server, but it is not required. dokieli will try to fetch the input HTTP URL, if CORS is enabled, it'll proceed, otherwise it will use its own proxy URL to fetch again.

I've just created https://github.com/linkeddata/dokieli/issues/200 to address the case where an HTML doesn't include dokieli's CSS and JS.

johanneswilm commented 7 years ago

Hey, I think you are right that the filter can be reused, and once we do Scholarly html and RASH export filters, we can probably start out by copying the dokieli export filter. However, doing this in two steps seems to only create problems with CORS, and makes it more diffcult for the person running it.

Adding extra files, JS/CSS, etc. isn't really a problem for our exporter system. For theDOCX and ODT exporter filters, the exporter downloads a prexisting xip file from our server that contains all kinds of data that our exporter doesn't need to understand. Our exporter then injects the XML containing the contents of our article and offers it as a download to the user as an ODT/DOCX file. I think the same should be possible here: On the server we store a zip file containing all the standard resoucrs of the dokieli system (JS/CSS), and possibly the outer parts of the HTML file (incl. links to CSS/JS files). Our exporter then onlky needs to walk through the document, create the HTML output to fit the dokieli format, and inject that in the right place in the output file.

What do you think, @csarven? Would it be possible to put everything needed for a dokieli instance in a zip file?

johanneswilm commented 7 years ago

Alternatively -- is there some toher HTML standard we could export to that dokieli could then import from? It seems like if we have an export filter, ti should be to some standard of some kind -- either dokieli itself or the standard of the dokieli document if that is a thing.

csarven commented 7 years ago

dokieli is intended to be flexible so that it is not locked into a single HTML template (contrary to other approaches out there). The example HTML template I gave above was only for the purpose of using some of dokieli's features eg building the ToC, having identifiers/semantics for each section/aside etc.. We're trying to capture different HTML+RDFa patterns so that it can be more reusable in those scenarios.

Something like https://dokie.li/new as a shell is completely fine. Or aim your export towards something like https://dokie.li/acm-sigproc-sp , https://dokie.li/lncs-splnproc .

I think the options that jump at me are the following (with increasing amount of work):

If there is a gap in FW/dokieli implementation somewhere, we can try to close that.

johanneswilm commented 7 years ago

dokieli is intended to be flexible so that it is not locked into a single HTML template (contrary to other approaches out there).

Ok, that makes sense. But I guess neither exporting to RASH nor Scholarly HTML would work for import into Dokieli, or would one of these (or a third standard) be fully "readable" by dokieli?

I think with the options that jump at me are the following (with increasing amount of work):

We have set aside something like two developers over the summer (@sadeghiafshin and one helper starting in a month or so) and the University of Bonn has stopped all other work on OSCOSS to work on just this, so we should do this properly. The consequence of not doing it properly would just be that we won't be able to merge it into FIdus Writer upstream and then it will turn into maintenance hell.

If there is an intermediate HTML format that Dokieli can read, that is standardized and is guaranteed to work, then we can export to that instead. It will not make our exporter for this simpler, but the advantage with exporting to a third-party HTMML would ofc ourse be that we would cover that standard simultaneously.

  • Include dokieli's CSS and JS along with FW's preferred HTML in its export

Ok, but that way would we know for sure that this it is working? FW stores it's data in a standardized JSON format and only serializes it to HTML to show/edit in the browser. We can either reuse that serializer or create our own tree walekr. Creating such a tree walker is fully possible (we have done it for three other formats already).

  • Include dokieli's CSS, JS, and reuse its existing HTML patterns in FW's export

This is what would seem like a proper solution to me. Is there a stable version of dokieli that we could use as a basis for this?

If there is a gap in FW/dokieli implementation somewhere, we can try to close that.

On the Fidus Writer side: most likely. We still do not capture a lot of semantic information, epecially data about the authors. Fixing this in Fidus Writer would be of general interest, but it would likely also be the greatest amount of work, because it will mean that all other parts will need to be expanded upon: other export filters, editor, native file format.

afshinsadeghi commented 7 years ago

I think that will be great if we have a template that is readable by dokieli and will not change soon. For example, a template that supports titles, subtitles, normal text, references and comments.

johanneswilm commented 7 years ago

@csarven Ok, we have talked about it. It sounds like this is the best solution

Include dokieli's CSS, JS, and reuse its existing HTML patterns in FW's export

Could you prepare such a zip file with the CSS and JS for us or explain to us how to do it? ANd could we have a template for the HTML file with citations, figures, abstract, etc. in the format that dokieli prefers it? Then we will use that as the standard and follow it.

csarven commented 7 years ago

Sounds good! This is something I'd like to have a documentation for in dokieli as well: https://github.com/linkeddata/dokieli/issues/201 .

Will keep you posted.

csarven commented 7 years ago

Started documentation: https://dokie.li/docs . I'll expand on the design decisions, patterns etc.

afshinsadeghi commented 7 years ago

thanks, @csarven I am waiting for more. Especially the template part. I will start with the template you shared above from next week. will the final version be very different?

csarven commented 7 years ago

It is in the general direction. I'll get the canonical pattern done next week.

afshinsadeghi commented 7 years ago

continuing with extending the base.js in fw/document/static/js/es6_modules/exporter/html/base.js adding rdf tags by jQuery(htmlCode).find()

csarven commented 7 years ago

Note that https://github.com/linkeddata/dokieli/issues/200 is resolved.

afshinsadeghi commented 7 years ago

great. I will update my dev version

afshinsadeghi commented 6 years ago

done