MaRDI4NFDI / python-zbMathRest2Oai

Read data from the zbMATH Open API https://api.zbmath.org/docs and feed it to the OAI-PMH server https://oai.portal.mardi4nfdi.de/oai/
GNU General Public License v3.0
4 stars 0 forks source link

XSLT #20

Closed Mazztok45 closed 9 months ago

Mazztok45 commented 10 months ago

This issue aims to help process the transformation of:

https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/main/tests/data/plain.xml

To:

https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/main/tests/data/reference.xml

With the help of these resources on XSL:

https://www.w3schools.com/xml/xsl_intro.asp https://developer.mozilla.org/en-US/docs/Web/XSLT/Element/stylesheet https://stackoverflow.com/questions/1344158/xslt-with-xml-source-that-has-a-default-namespace-set-to-xmlns

The XSL file must edit the stylesheet of the target file reference.xml.

Feel free to reach out if you require additional information.

physikerwelt commented 10 months ago

See https://github.com/MaRDI4NFDI/fiz-oai-docker/blob/master/examples/Radar2OAI_DC_v09.xsl for the radar to datacite transformation

physikerwelt commented 10 months ago

As example:

<root>
  <result>
    <biographic_references/>
    <contributors>
      <author_references/>
      <authors>
        <aliases/>
        <checked>1</checked>
        <codes>maynard.james</codes>
        <name>Maynard, James</name>
      </authors>
    </contributors>
  </result>
</root>

should be transformed to

<oai_zb_preview:zbmath xmlns:oai_zb_preview="https://zbmath.org/OAI/2.0/oai_zb_preview/"
                       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                       xmlns:zbmath="https://zbmath.org/zbmath/elements/1.0/">
    <zbmath:author>Maynard, James</zbmath:author>
    <zbmath:author_ids>
        <zbmath:author_id>maynard.james</zbmath:author_id>
    </zbmath:author_ids>
</oai_zb_preview>
physikerwelt commented 10 months ago

Example transformation

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<xsl:template match="/">
<oai_zb_preview>
      <xsl:for-each select="root/result/contributors/authors/name">
        <author><xsl:value-of select="."/></author>
      </xsl:for-each>
      <author_ids>
      <xsl:for-each select="root/result/contributors/authors/codes">
        <author_id><xsl:value-of select="."/></author_id>
      </xsl:for-each>
      </author_ids>
</oai_zb_preview>
</xsl:template>
</xsl:stylesheet>

tested via

https://www.freeformatter.com/xsl-transformer.html#before-output

image

Mazztok45 commented 9 months ago

With the commit :

https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/commit/e647997c5cd5289a959d9f7ad5371f5a8909d950

I consider this issue solved

physikerwelt commented 9 months ago

I don't.

1) I would put the XSLT file in a sperate directory or repository 2) I would ad example input and output data 3) document how the file can be used 4) descibe how the file can be uploaded to the server, optimally together with the script.

physikerwelt commented 9 months ago

PS: I implemented a small helper to apply XSLT transformations from the commandline https://github.com/physikerwelt/xstlprocJ

Mazztok45 commented 9 months ago

I don't.

1. I would put the XSLT file in a sperate directory or repository

2. I would ad example input and output data

3. document how the file can be used

4. descibe how the file can be uploaded to the server, optimally together with the [script](https://github.com/ER-FIZKarlsruhe/fiz-oai-docker/blob/778d8e3b0eea4aa10ab32c619a8beb7e2896f0cf/examples/createCrosswalks.sh).

Regarding the last point, should I directly contribute to https://github.com/ER-FIZKarlsruhe/fiz-oai-docker and improve so createCrosswalk.sh?

physikerwelt commented 9 months ago

You basically extend this command in two ways

1) Add basic auth https://stackoverflow.com/a/53630834 (don't commit the PW to the repo)

2) Set the url in

curl --noproxy '*' -X POST -H 'Content-Type: application/json' -i '@@OAI_EXTERNAL_BACKEND_URL@@/crosswalk' --data '{"name":"Radar2OAI_DC_v09","formatFrom":"radar","formatTo":"oai_dc","xsltStylesheet":'"$XSLT_RADAR_DC}"'}'
physikerwelt commented 9 months ago

You find url and user here https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/5df44ad2c0cba7bfcc949708b88bb345f8496ac5/src/zbmath_rest2oai/writeOai.py#L30-L31

physikerwelt commented 9 months ago

can this be closed now?

physikerwelt commented 9 months ago

I tested the script, xslt.sh and it currently creates the following error:

<!doctype html><html lang="en"><head><title>HTTP Status 415 – Unsupported Media Type</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 415 – Unsupported Media Type</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Unsupported Media Type</p><p><b>Description</b> The origin server is refusing to service the request because the payload is in a format not supported by this method on the target resource.</p><hr class="line" /><h3>Apache Tomcat/9.0.82</h3></body></html>%     

Please start from the existing script to create crosswalks.

Mazztok45 commented 9 months ago

can this be closed now?

Not yet.

Mazztok45 commented 9 months ago

I tested the script, xslt.sh and it currently creates the following error:

<!doctype html><html lang="en"><head><title>HTTP Status 415 – Unsupported Media Type</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 415 – Unsupported Media Type</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Unsupported Media Type</p><p><b>Description</b> The origin server is refusing to service the request because the payload is in a format not supported by this method on the target resource.</p><hr class="line" /><h3>Apache Tomcat/9.0.82</h3></body></html>%     

Please start from the existing script to create crosswalks.

I did not have the same message, only parsing problem on JSON

Mazztok45 commented 9 months ago

My last commit fixes the error I have. Could you please try it? It tells me when I execute the script again that the XSLT file is already available on the server.

physikerwelt commented 9 months ago

I'll check and close.

Mazztok45 commented 9 months ago

My last two commits modified the createFormat.sh file to create the new format "zbmath_rest_api" and the xslt.sh file to create the new crosswalk using this new format. The last step is configuring the server to use the xslt file to display the target XML format. Should we create a new issue for that? Or consider it is still part of this issue? Do you see any other challenges concerning this issue?

physikerwelt commented 9 months ago

tested that for

https://oai.portal.mardi4nfdi.de/oai/OAIHandler?verb=GetRecord&metadataPrefix=oai_zb_preview&identifier=370149


<oai_zb_preview>
   <author>Chao, C. K.</author>
   <author>Chang, R. C.</author>
   <author_ids>
      <author_id>chao.ching-kong</author_id>
      <author_id>chang.ruei-chin</author_id>
      <author_id>chang.ruei-chuan</author_id>
      <author_id>chang.rong-chi</author_id>
      <author_id>chang.ray-c</author_id>
   </author_ids>
   <classifications>
      <classification>74R99</classification>
      <classification>74A15</classification>
      <classification>74E10</classification>
      <classification>80A20</classification>
   </classifications>
   <zbmath:review_language></zbmath:review_language>
   <zbmath:review_sign></zbmath:review_sign>
   <zbmath:review_text>A solution is given for the steady-state heat conduction problem of the interface crack between dissimilar anisotropic media. Based on the Hilbert problem formulation and a special technique of analytical continuation, exact expressions are obtained for the temperature and temperature gradients for both the heat flux prescribed and temperature prescribed boundary conditions.</zbmath:review_text>
   <zbmath:review_type>review</zbmath:review_type>
   <reviewer>
      <zbmath:reviewer></zbmath:reviewer>
      <zbmath:reviewer_id></zbmath:reviewer_id>
   </reviewer>
   <zbmath:zbl_id>0771.73062</zbmath:zbl_id>
   <zbmath:keywords>
      <zbmath:keyword>Hilbert problem</zbmath:keyword>
      <zbmath:keyword>analytical continuation</zbmath:keyword>
      <zbmath:keyword>temperature gradients</zbmath:keyword>
   </zbmath:keywords>
</oai_zb_preview>