MathHubInfo / Legacy-planetary

Legacy: Planetary System is a math-enabled Web 3.0 information portal.
http://trac.mathweb.org/planetary/
79 stars 25 forks source link

one more check: is the metadata generated by Planetary/LaTeXML correct? #297

Closed holtzermann17 closed 11 years ago

holtzermann17 commented 11 years ago

First order of business is to make it so that the metadata is parsed again (https://trac.mathweb.org/LaTeXML/ticket/1674) -- I've suppressed the about="." bits in the below, to match what we're actually expecting.

After that, let's check that it actually says what we want it to say. Current metadata for an example article:

<div class="rdf" property="dct:identifier" content="IncludePictures"/>
<div class="rdf" property="dct:created" datatype="xsd:date"
  content="2013-01-2017:58:57"/>
<div class="rdf" property="dct:modified" datatype="xsd:date"
  content="2013-01-2017:58:57"/>
<div class="rdf" resource="pmuser:1" property="pm:owner"/>
<div class="rdf" resource="pmuser:1" property="pm:modifier"/>
<div class="rdf" property="dct:title" content="Includepictures?"/>
<div class="rdf" property="dct:hasVersion" content="1"/>
<div class="rdf" resource="pmuser:1" property="dct:creator"/>
<div class="rdf" property="pm:privacy" content="1"/>
<div class="rdf" property="dct:type" content="Definition"/>

At first glance that's pretty reasonable, but I think we should look at a few more examples to be sure. (One thing: the space has been removed from the title for some reason: Includepictures? should be Include pictures?.)

For one thing, this example article doesn't have "parent", "related", or "synonym" data -- and the owner, modifier, and creator are all unlord. We should look at a few other examples that are more complicated, and try to get things right with those. (NB. I'm not sure we ever deal with long lists of co-authors in the correct fashion.)

Here's a more complicated (concocted) example (from http://alpha.planetmath.org/jordansinequality):

<div class="rdf" property="dct:identifier" content="JordansInequality"/>
<div class="rdf" property="dct:created" datatype="xsd:date"
  content="2013-01-2112:36:20"/>
<div class="rdf" property="dct:modified" datatype="xsd:date"
  content="2013-01-2112:36:20"/>
<div class="rdf" resource="pmuser:127" property="pm:owner"/>
<div class="rdf" resource="pmuser:1" property="pm:modifier"/>
<div class="rdf" property="dct:title" content="Jordan’sinequality"/>
<div class="rdf" property="dct:hasVersion" content="1"/>
<div class="rdf" resource="pmuser:1" property="dct:creator"/>
<div class="rdf" property="pm:privacy" content="1"/>
<div class="rdf" property="dct:type" content="Theorem"/>
<div class="rdf" property="pm:comment" content="trytomakeaMNWE"/>
<div class="rdf" resource="msc:00A99" property="dct:subject"/>
<div class="rdf" resource="msc:26D05" property="dct:subject"/>
<div class="rdf" about="pmconcept:Jordaninequality"
  resource="pmconcept:JordansInequality" property="pm:synonym"/>
<div class="rdf" resource="pmarticle:ComparisonOfSinThetaAndThetaNearTheta0"
  property="pm:related"/>
<div class="rdf" property="pm:defines" content="pmconcept:jordanian"/>

Here we start to see some errors and mis-steps.

This is the LaTeX source that it comes from:

\pmcanonicalname{JordansInequality}
\pmcreated{2013-01-21 12:34:09}
\pmmodified{2013-01-21 12:34:09}
\pmowner{Koro}{127}
\pmmodifier{Koro}{1}
\pmtitle{Jordan's inequality}
\pmrecord{1}{30013}
\pmauthor{Koro}{1}
\pmprivacy{1}
\pmtype{Theorem}
\pmcomment{try to make a MNWE}
\pmclassification{msc}{00A99}
\pmclassification{msc}{26D05}
\pmsynonym{Jordan inequality}{JordansInequality}
%\pmkeywords{Jordan}
%\pmkeywords{inequality}
\pmrelated{ComparisonOfSinThetaAndThetaNearTheta0}
\pmdefines{jordanian}

Some of this can be fixed on the Drupal side, some will have to be fixed on the LaTeXML side. To clarify the division of labor: looking at the PHP code in planetmath_edit_article that defines the basic metadata to send over, we see a number of errors:

  $metadata .= "\\pmowner{".$node->name."}{".$node->uid."}\n";
  $metadata .= "\\pmmodifier{".$node->name."}{".$user->uid."}\n";
  $metadata .= "\\pmtitle{".$node->title."}\n";
  // this will have to change when we have a proper versioning system in place
  $metadata .= "\\pmrecord{1}{".$node->nid."}\n";
  $metadata .= "\\pmauthor{".$node->name."}{".$user->uid."}\n";

Drupal fixes: The only time $user->uid should be used is for the modifier, and the modifier's name should be supplied, not the one from the node. The list of authors should come from the OG implementation, and each one should be added separately. The record number should correspond to the actual version number, and shouldn't be "1" all the time.

LaTeXML fixes: use strings instead of canonical names, don't delete spaces.

As a general question, I don't know if we want to distinguish between the "owner" and the "creator", or how we would go about doing that in RDF. Let's just go with "owner is creator" for now, even though that could fail to match up to reality for lots of reasons in practice.

dginev commented 11 years ago

As to the URLs being used for the RDFa namespaces pmarticle, pmuser and pmconcept - let's try to be as consistent as possible with the "real" URLs. I will leave all three to point to the base URL (alpha.planetmath.org).

This makes sense, since the way Drupal serves pages is similar (all pages are served at the base URL) and it even makes sense for pmconcept as each article defines its own concept. Only that some articles may define more than one concept ... ok, i will make an exception for pmconcept and make it point to base/concept.

I am not sure whether in the long-run it would make more sense to make Drupal serve articles at /article, users at /user and concepts at /concept, to have a cleaner setup. But for now I will stick with what you have already.

dginev commented 11 years ago

Ok, I read your comment in NNexus issue 5 that we should be using canonical names to ease the migration between Noosphere and Planetary, so I went ahead and switched all userURI macros to use the canonical username rather than the database uuid, which is what was used until now.

holtzermann17 commented 11 years ago

Any guess about why the spaces are removed? E.g. "Jordan’sinequality"?

dginev commented 11 years ago

I was trying to avoid responsibility too much and didn't realize my editor has all metadata in its body, rather than the preamble. In classic TeX any macros expanded in the preamble of the document are expanded in a "skip space" mode, similarly to expanding $math$, which results in the spaces being eaten away.

I have asked Bruce for advice of the cleanest treatment for avoiding this, although I know a few so-so methods myself. I should have committed a fix by end of tomorrow. Thanks for the reminder!

dginev commented 11 years ago

As promised, the spacing problem has been resolved, checked in and redeployed at latexml.mathweb.org.

It was a bug in the new lxRDFa native binding for LaTeXML, you have mine and Bruce's thanks for bringing this bug to our attention.

holtzermann17 commented 11 years ago

Note @dginev, in planetmath-specials.sty.ltxml, we need to define pmprivacy as follows, in order for queries to deal with it properly:

DefMacro('\pmprivacy{}','\pmmeta@literal@adhoc[xsd:integer]{privacy}{#1}');

(This is copied to LaTeXML ticket #1681.)

dginev commented 11 years ago

This is now done, reopen if anything else comes to mind.

dginev commented 11 years ago

Oh, I don't have permissions to close, you'll have to do that yourself (and maybe give me permissions for the future?)

cdavid commented 11 years ago

Gave you permissions now.