Open m1ci opened 9 years ago
We thought about the dc:identifier
property during a call about NIF conversion. We agreed that it could be useful for XLIFF roundtripping: it keeps trace of the related translation unit. It has no any relevant meaning while converting HTML files.
This content is converted back to HTML by using the NIF file having markups in the context. This is the markups NIF file generated with that HTML (I added it to the documentation as well)
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<http://freme-project.eu/doc1/#char=0,121>
a nif:RFC5147String , nif:Context , nif:String ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "121"^^xsd:nonNegativeInteger ;
nif:isString "<!DOCTYPE html>\r\n<html><head>\r\n\t<title>Roundtripping</title>\r\n</head>\r\n<body>\r\n<p>Welcome to Dublin</p>\r\n\r\n</body></html>"@en .
<http://freme-project.eu/#char=14,31>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Welcome to Dublin"@en ;
nif:beginIndex "14"^^xsd:nonNegativeInteger ;
nif:endIndex "31"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/#char=0,31> ;
nif:wasConvertedFrom <http://freme-project.eu/doc1/#char=82,99> ;
dc:identifier "2" .
<http://freme-project.eu/#char=0,13>
a nif:RFC5147String , nif:String ;
nif:anchorOf "Roundtripping"@en ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "13"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/#char=0,31> ;
nif:wasConvertedFrom <http://freme-project.eu/doc1/#char=39,52> ;
dc:identifier "1" .
<http://freme-project.eu/#char=0,31>
a nif:RFC5147String , nif:Context , nif:String ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "31"^^xsd:nonNegativeInteger ;
nif:isString "Roundtripping Welcome to Dublin"@en .
Could you explain what you mean with the last question, please?
We thought about the dc:identifier property during a call about NIF conversion. We agreed that it could be useful for XLIFF roundtripping: it keeps trace of the related translation unit. It has no any relevant meaning while converting HTML files.
OK, thanks for the reminder.
This content is converted back to HTML by using the NIF file having markups in the context.
The NIF context containing the source markup is not returned. Why?
Could you explain what you mean with the last question, please?
The same as the question above. Why the NIF context containing the markup is not returned?
The same as the question above. Why the NIF context containing the markup is not returned?
We could include it in the NIF response. I thought we create two separate NIF documents and thats why I did not merge the two NIF documents before returning it to the user. Also it is unclear to me which URI we use for this information.
We agreed to produce two different NIF files, because we needed two contexts: one including markups and one containing only plain text. The reason was because FREME e-Services cannot deal with a NIF file having two contexts. Moreover, since the context including markups is only needed for performing the round-tripping (it is not relevant for the final user), it is not returned by the service and it is temporary saved on the local machine.
Regarding URI, thank you for reminding that. We should think of a strategy for generating unique URIs, so that we are sure of merging the correct files when doing round-tripping. It's already possible to choose a URI from outside and pass it to the conversion method. Anyway at the moment http://freme-project.eu/
is the base URI for plain text context, while http://freme-project.eu/doc1/
is the base URI for markups context. It is a temporary solution and I think it should be changed.
The reason was because FREME e-Services cannot deal with a NIF file having two contexts. Moreover, since the context including markups is only needed for performing the round-tripping (it is not relevant for the final user), it is not returned by the service and it is temporary saved on the local machine.
OK, makes sense.
Regarding URI, thank you for reminding that. We should think of a strategy for generating unique URIs, so that we are sure of merging the correct files when doing round-tripping.
Indeed. I think we should use hash values generated out from the content. In NIF, the URIs for Strings can be 1) "Offset Based Strings" - this is what we are using now, and also 2) "Context Hash Based String" - remain more robust regarding document changes. See the guidelines how they are constructed: http://jens-lehmann.org/files/2012/ekaw_nif.pdf (page 4).
It's already possible to choose a URI from outside and pass it to the conversion method.
In some scenarios, this can be an option. But for the round-tripping, its maybe better if the URIs are generated at the server-side. Lets see what others think.
We could also use Javas unique ID generator: java.util.UUID Java API Doc and a short tutorial. We use this technique to generate tokens. FREME tokens are actually java UUIDs.
The advantage of UUIDs over hash values are that they are truly unique. When someone sends text plaintext from two different sources but with the same content to FREME, then using hash values they will get the same URLs which is IMO problematic.
No matter if we use hash values or UUIDs we could use these unique URIs in two areas:
http://freme-project.eu/ns/067e6162-3b6f-4ae2-a171-2470b63dff00
. Right now we just generate http://freme-project.eu
URIs in this case and a unique URI generated by the server seems to be better IMO.http://freme-project.eu/ns/067e6162-3b6f-4ae2-a171-2470b63dff00/markup
I suggest to move the discussion about unique URIs to a new issue in technical discussion and make it a feature of a future version of FREME, e.g. FREME 0.5. Or do you think this is an bug that needs to be fixed right now?
We should think of a strategy for generating unique URIs, so that we are sure of merging the correct files when doing round-tripping.
In the current implementation of roundtripping we merge the correct files. Actually we generate the URI http://freme-project.eu
for all resources send through e-Internationalization. We separate resources that do not belong together not via the NIF URIs but because they are generated in different HTTP requests.
Thanks for the proposal Jan. Personally, I don't like the idea of using UUID for the main reason that it is not compatible with the NIF spec. We should stick to the NIF spec.
I suggest using "hash based" URIs with a unique prefix base for the URI.
Example http://freme-project.eu/doc1/#hash_0_30_067e61623b6f4ae2a1712470b63dff00
Where the http://freme-project.eu/doc1/
is unique part proposed by the client or server.
and #hash_0_30_067e61623b6f4ae2a1712470b63dff00
is hash value representing the content. For more on constructing Context-Hash-based URIs see http://jens-lehmann.org/files/2012/ekaw_nif.pdf (page 4)
For an HTML
You create following NIF:
dc:identifier
? It is a pointer to some your back-end mapping rule?