MaRDI4NFDI / python-zbMathRest2Oai

Read data from the zbMATH Open API https://api.zbmath.org/docs and feed it to the OAI-PMH server https://oai.portal.mardi4nfdi.de/oai/
GNU General Public License v3.0
4 stars 0 forks source link

XLST tranformation for articles metadata #21

Closed Mazztok45 closed 9 months ago

Mazztok45 commented 9 months ago

Describe the issue This issue aims to help with processing the transformation from:

https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/main/test/data/plain.xml

To:

https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/main/test/data/reference.xml

This XSLT file must be improved such we can fully achieve this transformation: https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/blob/main/xslt/xslt-article-transformation.xslt

Progress:

Missing fields:

physikerwelt commented 9 months ago

@Mazztok45 / @Shirazos7 I made a progress list, so we can check what section is done and which is still missing

Mazztok45 commented 9 months ago

Very useful, thanks. I added a section for missing fields.

Shirazos7 commented 9 months ago

i will work on the rest and even edit the existed nodes to fit the requirements as well.

physikerwelt commented 9 months ago

@Mazztok45 I don't get the idea of missing fields. Rights are a constant string. What do you mean that links is a "missing" field?

Mazztok45 commented 9 months ago

@Mazztok45 I don't get the idea of missing fields. Rights are a constant string. What do you mean that links is a "missing" field?

These fields are in the reference.xml file (the target) but not the plain.xml file (the source). So, regarding our previous discussions on missing elements, we should indicate these elements are missing.

physikerwelt commented 9 months ago

links should be in there. Maybe with a different name. For rights, you are right, however it does not be in as it's a constant value

Mazztok45 commented 9 months ago

Links are not presented in the same way. For example in reference.xml: `

https://arxiv.org/abs/1311.4600 ` while in plain-xml: ` 1311.4600 arxiv ` Should we build and display the URI ourselves with string parsing? > however it does not be in as it's a constant value I agree, we will use if conditions so.
physikerwelt commented 9 months ago

The current API uses this code


def expand_url_ids(url):

     lc = url.lower()

    if lc.startswith("doi:"):

        return "doi", "https://doi.org/%s" % url[6:]

    elif lc.startswith("http:") or lc.startswith("https:"):

        return "http", url

    elif lc.startswith("ftp:"):

        return "ftp", url

    elif lc.startswith("euclid:"):

        return "euclid", "https://projecteuclid.org/euclid.%s" % url[9:]

    elif lc.startswith("crelle:"):

        return (

            "crelle",

            "https://www.digizeitschriften.de/dms/resolveppn/?PPN=%s" % url[9:],

        )

    elif lc.startswith("emis:"):

        return "emis", "http://www.emis.de/%s" % url[7:]

    elif lc.startswith("eudml:"):

        return "eudml", "https://eudml.org/doc/%s" % url[8:]

    elif lc.startswith("arxiv:"):

        return "arxiv", "https://arxiv.org/abs/%s" % url[8:]

    elif lc.startswith("vixra:"):

        return "vixra", "http://www.vixra.org/abs/%s" % url[8:]

    elif lc.startswith("numdam:"):

        return "numdam", "http://www.numdam.org/item?id=%s" % url[9:]

    elif lc.startswith("gallica:"):

        return "gallica", "http://gallica.bnf.fr/ark:/%s" % url[10:]

    elif lc.startswith("lni:"):

        return "lni", "http://subs.emis.de/%s" % url[6:]

    elif lc.startswith("mathnetru:"):

        return "mathnetru", "http://mathnet.ru/%s" % url[12:]

    return None, url
physikerwelt commented 9 months ago

Ask chatgpt how to translate to an XSLT template;-)

physikerwelt commented 9 months ago

ETA tuesday

physikerwelt commented 9 months ago

Today it is tuesday, and I do not see any commits for this.

Mazztok45 commented 9 months ago

We face challenges integrating the XSL code that can handle concatenation for the "links" node. We plan to fix it tomorrow.

physikerwelt commented 9 months ago

noted. If you can't make it assign it back to me.

physikerwelt commented 9 months ago

ChatGPT says https://chat.openai.com/share/44218c4d-c2ad-49c4-95e8-4f4e3c7285e5


<!-- Paste your XSLT code here -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:zbmath="http://www.zbmath.org">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/links">
        <zbmath:links>
            <zbmath:link>
                <xsl:call-template name="expand-url">
                    <xsl:with-param name="type" select="type"/>
                    <xsl:with-param name="identifier" select="identifier"/>
                </xsl:call-template>
            </zbmath:link>
        </zbmath:links>
    </xsl:template>

    <xsl:template name="expand-url">
        <xsl:param name="type"/>
        <xsl:param name="identifier"/>
        <xsl:choose>
            <xsl:when test="$type = 'doi'">
                <xsl:value-of select="concat('https://doi.org/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'arxiv'">
                <xsl:value-of select="concat('https://arxiv.org/abs/', $identifier)"/>
            </xsl:when>
            <!-- Add more cases here based on your Python code -->
            <xsl:otherwise>
                <xsl:value-of select="$identifier"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>
physikerwelt commented 9 months ago

After another prompt it got the IMHO correct solution:


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:zbmath="http://www.zbmath.org" exclude-result-prefixes="zbmath">
    <xsl:output method="xml" indent="yes"/>

    <!-- Template to match the root (or any container element that contains your links) -->
    <xsl:template match="/">
        <zbmath:links>
            <!-- Apply templates to each links element -->
            <xsl:apply-templates select="//links"/>
        </zbmath:links>
    </xsl:template>

    <!-- Template to handle each links element -->
    <xsl:template match="links">
        <zbmath:link>
            <xsl:call-template name="expand-url">
                <xsl:with-param name="type" select="type"/>
                <xsl:with-param name="identifier" select="identifier"/>
            </xsl:call-template>
        </zbmath:link>
    </xsl:template>

    <!-- Template to construct URL based on type -->
    <xsl:template name="expand-url">
        <xsl:param name="type"/>
        <xsl:param name="identifier"/>
        <xsl:choose>
            <!-- Add conditions for each type with URL construction logic -->
            <xsl:when test="$type = 'doi'">
                <xsl:value-of select="concat('https://doi.org/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'arxiv'">
                <xsl:value-of select="concat('https://arxiv.org/abs/', $identifier)"/>
            </xsl:when>
            <!-- Add more cases as needed based on the Python code logic -->
            <xsl:otherwise>
                <xsl:value-of select="$identifier"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>
physikerwelt commented 9 months ago

It even filled out the "more cases section".


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:zbmath="http://www.zbmath.org" exclude-result-prefixes="zbmath">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
        <zbmath:links>
            <xsl:apply-templates select="//links"/>
        </zbmath:links>
    </xsl:template>

    <xsl:template match="links">
        <zbmath:link>
            <xsl:call-template name="expand-url">
                <xsl:with-param name="type" select="type"/>
                <xsl:with-param name="identifier" select="identifier"/>
            </xsl:call-template>
        </zbmath:link>
    </xsl:template>

    <xsl:template name="expand-url">
        <xsl:param name="type"/>
        <xsl:param name="identifier"/>
        <xsl:choose>
            <xsl:when test="$type = 'doi'">
                <xsl:value-of select="concat('https://doi.org/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'arxiv'">
                <xsl:value-of select="concat('https://arxiv.org/abs/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'euclid'">
                <xsl:value-of select="concat('https://projecteuclid.org/euclid.', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'crelle'">
                <xsl:value-of select="concat('https://www.digizeitschriften.de/dms/resolveppn/?PPN=', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'emis'">
                <xsl:value-of select="concat('http://www.emis.de/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'eudml'">
                <xsl:value-of select="concat('https://eudml.org/doc/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'vixra'">
                <xsl:value-of select="concat('http://www.vixra.org/abs/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'numdam'">
                <xsl:value-of select="concat('http://www.numdam.org/item?id=', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'gallica'">
                <xsl:value-of select="concat('http://gallica.bnf.fr/ark:/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'lni'">
                <xsl:value-of select="concat('http://subs.emis.de/', $identifier)"/>
            </xsl:when>
            <xsl:when test="$type = 'mathnetru'">
                <xsl:value-of select="concat('http://mathnet.ru/', $identifier)"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$identifier"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
</xsl:stylesheet>
Mazztok45 commented 9 months ago

@Shirazos7 Can you please integrate the ChatGPT code elements from Moritz into the XSLT file you are working on?

physikerwelt commented 9 months ago

ChatGPT did also outline how to do this.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:oai_zb_preview="https://zbmath.org/OAI/2.0/oai_zb_preview/" 
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                xmlns:zbmath="https://zbmath.org/zbmath/elements/1.0/"
                exclude-result-prefixes="xsi oai_zb_preview">

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">
        <oai_zb_preview>
            <xsl:for-each select="root/result/contributors/authors/name">
                <author><xsl:value-of select="."/></author>
            </xsl:for-each>
            <author_ids>
                <xsl:for-each select="root/result/contributors/authors/codes">
                    <author_id><xsl:value-of select="."/></author_id>
                </xsl:for-each>
            </author_ids>
            <zbmath:classifications>
                <xsl:for-each select="root/result/msc/code">
                    <zbmath:classification><xsl:value-of select="."/></zbmath:classification>
                </xsl:for-each>
            </zbmath:classifications>
            <zbmath:review_language>
                <xsl:value-of select="root/result/editorial_contributions/language/languages"/>
            </zbmath:review_language>
            <zbmath:pagination>
                <xsl:value-of select="root/result/pages"/>
            </zbmath:pagination>
            <zbmath:publication_year>
                <xsl:value-of select="root/result/year"/>
            </zbmath:publication_year>
            <zbmath:source>
                <xsl:value-of select="root/result/source"/>
            </zbmath:source>
            <zbmath:spelling>
                <xsl:value-of select="root/result/name"/>
            </zbmath:spelling>
            <zbmath:zbl_id>
                <xsl:value-of select="root/result/identifier"/>
            </zbmath:zbl_id>
            <zbmath:review_sign>
                <xsl:value-of select="root/result/editorial_contributions/reviewer/sign"/>
            </zbmath:review_sign>
            <zbmath:review_text>
                <xsl:value-of select="root/result/editorial_contributions/text"/>
            </zbmath:review_text>
            <zbmath:review_type>
                <xsl:value-of select="root/result/editorial_contributions/contribution_type"/>
            </zbmath:review_type>
            <zbmath:reviewer>
                <zbmath:reviewer>
                    <xsl:value-of select="root/result/editorial_contributions/reviewer/reviewer_id"/>
                </zbmath:reviewer>
                <zbmath:reviewer_id>
                    <xsl:value-of select="root/result/editorial_contributions/reviewer/author_code"/>
                </zbmath:reviewer_id>
            </zbmath:reviewer>
            <zbmath:keywords>
                <xsl:for-each select="root/result/keywords">
                    <zbmath:keyword><xsl:value-of select="."/></zbmath:keyword>
                </xsl:for-each>
            </zbmath:keywords>
            <zbmath:serial>
                <zbmath:serial_publisher>
                    <xsl:value-of select="root/result/publisher"/>
                </zbmath:serial_publisher>
                <zbmath:serial_title>
                    <xsl:value-of select="root/result/title"/>
                </zbmath:serial_title>
            </zbmath:serial>
            <zbmath:references>
                <zbmath_reference>
                    <zbmath:text>
                        <xsl:value-of select="root/result/references/text"/>
                    </zbmath:text>
                    <zbmath:ref_id>
                        <xsl:value-of select="root/result/references/document_id"/>
                    </zbmath:ref_id>
                    <zbmath:ref_classifications>
                        <xsl:for-each select="root/result/references/msc">
                            <zbmath:ref_classification>
                                <xsl:value-of select="."/>
                            </zbmath:ref_classification>
                        </xsl:for-each>
                    </zbmath:ref_classifications>
                </zbmath_reference>
            </zbmath:references>
            <zbmath:doi>
                <xsl:value-of select="root/result/identifier"/>
            </zbmath:doi>
            <!-- Integrated Links Handling -->
            <zbmath:links>
                <xsl:for-each select="root/result/links">
                    <zbmath:link>
                        <xsl:choose>
                            <xsl:when test="type = 'doi'">
                                <xsl:value-of select="concat('https://doi.org/', identifier)"/>
                            </xsl:when>
                            <xsl:when test="type = 'arxiv'">
                                <xsl:value-of select="concat('https://arxiv.org/abs/', identifier)"/>
                            </xsl:when>
                            <!-- Add additional cases here -->
                        </xsl:choose>
                    </zbmath:link>
                </xsl:for-each>
            </zbmath:links>
            <zbmath:reference>
                <zbmath:text>
                    <xsl:value-of select="root/result/text"/>
                </zbmath:text>
                <zbmath:ref_id>
                    <xsl:value-of select="root/result/document_id"/>
                </zbmath:ref_id>
                <zbmath:ref_classification>
                    <xsl:value-of select="root/result/msc"/>
                </zbmath:ref_classification>
            </zbmath:reference>
        </oai_zb_preview>
    </xsl:template>
</xsl:stylesheet>

Maybe it is even simpler if you copy and test yourself?

Shirazos7 commented 9 months ago

@physikerwelt hello Moritz , thanks for your help , i am working on it meanwhile , i have also almost the same code as well , but i will test all the possibilities to see which shape would give us the best result

Shirazos7 commented 9 months ago

i made a commit a few minutes ago , which i guess should satisfy our target for the transformation of the metadata from plain.xml to reference.xml within the XSLT. I think the issue could be closed after you test it. https://github.com/MaRDI4NFDI/python-zbMathRest2Oai/commit/ab9e1b21f7e04b1b850741a03015bdccf84f9fe6

@Mazztok45
@physikerwelt

physikerwelt commented 9 months ago

Yes, it's not worse than the solution of ChatGPT. However, neither chatGPT nor you managed to put the other bits here

                           <!-- Add additional cases here -->

https://chat.openai.com/share/6a2b8f1f-334e-4340-982b-18af3f0541e9

I am now really interested to understand how to write the promt or issue in a way that the complete transformation is generated. Maybe I should try with GitHub copilot to get better and faster redeting of the xslt file.

Shirazos7 commented 9 months ago

@physikerwelt sorry Moritz it was my bad i was just focusing on getting the matched result , and forgot to add the other cases ( Links ) we could have in other articles . i think it wont be a problem to integrate them into the existing code and include them under the existing arxiv , doi elements .

physikerwelt commented 9 months ago

@Shirazos7 I don't think so. I think it is on us to formulate the tasks in a way that they have the adequate level of difficulty.

physikerwelt commented 9 months ago

I'll merge this now.

Shirazos7 commented 9 months ago

okay . i hope it will work properly :)

physikerwelt commented 9 months ago

I just updated the reference file. The data was not updated.