adsabs / export_service

Export service to output ADS records with various formats including BibTex, AASTex, and multiple tagged and xml options
MIT License
3 stars 5 forks source link

JATS format #233

Open golnazads opened 1 year ago

golnazads commented 1 year ago

Implement JATS export format.

Main Reference: https://jats.taylorandfrancis.com/jats-guide/topics/references/

It seems that JATS format is more like BibTex that needs to export different fields based on doctype. According to this reference there are 5 formats listed below.

@aaccomazzi we have put some limitation on export service by deciding that export does not do any field extraction and relies on the database to provide the fielded data. This limitation prevents exporting the formats as specified. Including: 1- Conference format requires conference name/location. 2- Book with editor and Book requires publisher name/location. Also Book format has specified edition. 3- Collaboration has its own tag. 4- Report has also specified location and link.

There are five JATS formats. Not sure how to many the rest of ADS doctypes (bookreview, editorial, obituary, misc, erratum, phdthesis, pressrelease, circular, software, software, newsletter, proposal, mastersthesis, talk, inbook, abstract) to these.

If we ignore conference name/location, publisher name/location then we can have one format for all doctype to include, publication-type, person-group (include editor if available), year, title, journal if available, volume if available, issue if available, first page if available, last page if available, doi if available. Thoughts?

1-Journal Article (doctype: article, eprint) \<ref id="CIT0002"> \<label>2.\</label> \<mixed-citation publication-type="journal"> \<person-group person-group-type="author">\<string-name>\<surname>lastname\</surname>, \<given-names>first inital\</given-names>\</string-name> more authors\</person-group> (\<year>year\</year>) \<article-title>title\</article-title>. \<source>\<italic>journal\</italic>\</source>, \<volume>volume\</volume> (\<issue>issue if available\</issue>):\<fpage>first page\</fpage> #x2013;\<lpage>last page if available\</lpage>. doi:\<pub-id pub-id-type="doi">doi\</pub-id> \</mixed-citation> \</ref>

2- Conference (doctype: proceedings) \<ref id="CIT0005"> \<label>5.\</label> \<mixed-citation publication-type="confproc"> \<person-group person-group-type="author">\<string-name>\<surname>lastname\</surname>, \<given-names>first inital\</given-names>\</string-name> other authors\</person-group> (\<year>year\</year>) \<article-title>title\</article-title>, \<conf-name>need to extract the from pub_raw?\</conf-name>, \<conf-loc>again in pub_raw\</conf-loc>, \<month>is available in ADS\</month> \<year>year\</year>. \</mixed-citation>\</ref>

3- Book with editor (doctype: inproceedings) \<ref id="CIT0128"> \<label>128.\</label> \<mixed-citation publication-type="book"> \<person-group person-group-type="author">\<string-name>\<surname>lastname\</surname>, \<given-names>first inital\</given-names>\</string-name>, other authors\</person-group> (\<year>2009\</year>) \<source>\<italic>title\</italic>\</source>; \<person-group person-group-type="editor">\<string-name>\<surname>lastname\</surname>\</string-name>, \<etal>et al.\</etal>, \<role>Eds.\</person-group>; \<publisher-name>in pub_raw\</publisher-name>: \<publisher-loc>in pub_raw\</publisher-loc>, \<fpage>first page\</fpage> #x2013;\<lpage>last page if available\</lpage>. \</mixed-citation> \</ref>

4- Book, second edition (doctype: book) \<ref id="CIT0016"> \<mixed-citation publication-type="book"> \<person-group person-group-type="author">\<collab>could be collaboration\</collab>\</person-group>. (\<year>2009\</year>). \<source>\<italic>title\</italic>\</source> (\<edition>is this available in ADS\</edition> ed.). \<publisher-loc>in pub_raw\</publisher-loc>: \<publisher-name>in pub_raw\</publisher-name>. \</mixed-citation> \</ref>

5- Report (doctype: techreport) \<ref id="CIT0023"> \<mixed-citation publication-type="report"> \<person-group person-group-type="author">\<collab>could be collaboration\</collab>\</person-group>. (\<year>2008\</year>). \<source>title\</source>. \<publisher-loc>in pub_raw\</publisher-loc>: \<publisher-name>in pub_raw\</publisher-name>. Retrieved from \<ext-link ext-link-type="uri" xlink:href="is this available in ADS" >link here too if available in ADS\</ext-link> \</mixed-citation> \</ref>

Other References: 1- https://jats.nlm.nih.gov/ 2- https://github.com/mfenner/pandoc-jats 3- https://dtd.nlm.nih.gov/archiving/tag-library/3.0/index.html?attr=article-type to map ADS doctype to JATS doctype. 4- For keywords https://jats.nlm.nih.gov/publishing/tag-library/1.1d2/chapter/tag-keywords.html 5- https://jats.nlm.nih.gov/archiving/tag-library/0.4/n-3bw0.html 6- https://github.com/mfenner/pandoc-jats/blob/master/jats.csl 7- https://www.ncbi.nlm.nih.gov/pmc/pmcdoc/tagging-guidelines/article/dobs.html#dob-refs 8- https://typeset.io/resources/jats-xml-everything-a-publisher-needs-to-know/ 9- https://jats4r.org/software-citations/#examples

golnazads commented 1 year ago

@aaccomazzi I mapped doctype to JATS as listed here https://jats.taylorandfrancis.com/jats-guide/topics/references/#type-of-publication

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

JATS | ADS -- | -- book | book, inproceedings, inbook confproc | proceedings journal | article, abstract, eprint thesis | phdthesis, mastersthesis software | software report | techreport review | bookreview other | circular, editorial, erratum, misc, newsletter, obituary, pressrelease, proposal, talk

The remaining JATS types are data, legal-case, legislation, letter, newspaper, patent, standard, web, working-paper if you think any of these are more suitable to be mapped to doctype. Also I got the list of doctypes from export that I had saved a few years ago, has there been any new doctypes introduced recently? Collection is down for me to check now.

aaccomazzi commented 1 year ago

It all sounds good, @golnazads Doctypes are listed here: https://github.com/adsabs/ingest_data_model/blob/main/adsingestschema/Publication.json I think you have them all in your list.

If some of the fields are required for publication types such as conferences, does it mean that we can't create the proper output? What if we leave the XML element empty?

golnazads commented 1 year ago

Thank you for pointing to the list. You have removed some doctypes, (ie, bookreview, obituary, talk), and have renamed misc?

Let me do a bit of manipulation on pub_raw to see if I can extract the conference name/location out. Not going to add it to export, just as an experiment to see how straightforward it is. golnaz

On Thu, Apr 20, 2023 at 11:29 AM Alberto Accomazzi @.***> wrote:

It all sounds good, @golnazads https://github.com/golnazads Doctypes are listed here: https://github.com/adsabs/ingest_data_model/blob/main/adsingestschema/Publication.json I think you have them all in your list.

If some of the fields are required for publication types such as conferences, does it mean that we can't create the proper output? What if we leave the XML element empty?

— Reply to this email directly, view it on GitHub https://github.com/adsabs/export_service/issues/233#issuecomment-1516536527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CAUTIF43VEHUSCOXI3XCFI33ANCNFSM6AAAAAAXEVNLJM . You are receiving this because you were mentioned.Message ID: @.***>

aaccomazzi commented 1 year ago

Thank you for point this out, I completely forgot! I think the publication types list in the data model came from this list: https://github.com/adsabs/adspy/blob/master/PubType.py#L47 But I see that there are other pubtypes so that list is not exhaustive.

@seasidesparrow can you please update the model to reflect the complete list? It would be good to check SOLR and see what all the instances are in this field.

golnazads commented 1 year ago

I think pub_raw can be manipulated to extract conference name/location out. Looking at a few entities, it seems two formats that are pretty close conference name including quoted string-comma-all caps with digits-comma-possible location city-comma-country all caps with digits-conference name including quoted string-comma-possible location city-comma-country some examples below.

I can go with this for now, for only the JATS export. Thoughts.

International Conference "Scientific and Technological Development of the Agro-Industrial Complex for the Purposes of Sustainable Development" (STDAIC-2022), Bishkek, Kyrgyzstan, Edited by Shergaziev, U.A. The First International Interdisciplinary Scientific and Practical Conference Man in the Arctic (IIRPCMIA 2021), Saint Petersbourg, Russia, Edited by Makhovikov, A. International Scientific and Practical Conference "Development and Modern Problems of Aquaculture" (AQUACULTURE 2022), Divnomorskoe village, Krasnodar region, Russia, Edited by Rudoy, D.V. 3rd International Conference on Energetics, Civil and Agricultural Engineering (ICECAE 2022), Tashkent, Uzbekistan, Edited by Tursunov, O. RICAP-22, 8th Roma International Conference on Astroparticle Physics, Roma, Italy, Edited by Capone, A. The 3rd International Conference on Natural Resources and Life Sciences (NRLS) 2020, Virtual, Edited by Setyobudi, R.H. IV International Scientific Conference "Construction Mechanics, Hydraulics and Water Resources Engineering" (CONMECHYDRO 2022), Tashkent, Uzbekistan, Edited by Bazarov, D. IV International Scientific Conference "Construction Mechanics, Hydraulics and Water Resources Engineering" (CONMECHYDRO 2022) Tashkent, Uzbekistan, Edited by Bazarov, D. Virtual Conference on Condensed Matter Physics (ISCMP2023), abstract The 20th International Conference on Strangeness in Quark Matter (SQM 2022), Busan, Republic of Korea, Edited by Kim, Y. Journal of Physics: Conference Series, Volume 2473, Issue 1, id.012022, 6 pp. IOP Conference Series: Earth and Environmental Science, Volume 1162, Issue 1, id.012015, 8 pp. golnaz AIP Conference Proceedings, Volume 2776, Issue 1, id.100008, 14 pp.

On Thu, Apr 20, 2023 at 12:05 PM Shapurian, Golnaz < @.***> wrote:

Thank you for pointing to the list. You have removed some doctypes, (ie, bookreview, obituary, talk), and have renamed misc?

Let me do a bit of manipulation on pub_raw to see if I can extract the conference name/location out. Not going to add it to export, just as an experiment to see how straightforward it is. golnaz

On Thu, Apr 20, 2023 at 11:29 AM Alberto Accomazzi < @.***> wrote:

It all sounds good, @golnazads https://github.com/golnazads Doctypes are listed here: https://github.com/adsabs/ingest_data_model/blob/main/adsingestschema/Publication.json I think you have them all in your list.

If some of the fields are required for publication types such as conferences, does it mean that we can't create the proper output? What if we leave the XML element empty?

— Reply to this email directly, view it on GitHub https://github.com/adsabs/export_service/issues/233#issuecomment-1516536527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CAUTIF43VEHUSCOXI3XCFI33ANCNFSM6AAAAAAXEVNLJM . You are receiving this because you were mentioned.Message ID: @.***>

golnazads commented 1 year ago

@aaccomazzi here is the output of JATS for all 22 different ADS doctypes.

{'msg': 'Retrieved 22 abstracts, starting with number 1.', 'export': '\<?xml version=\'1.0\' encoding=\'utf8\'?> \<records retrieved="22" start="1" selected="22" citations="308"> \<ref id="CIT001"> \<label>1.\</label> \<mixed-citation publication-type="review"> (\<year>2018\</year>) \<source>Book reviews\</source> \<volume>73\</volume>( \<issue>1\</issue>):\<fpage>35\</fpage> #x2013;\<lpage>35\</lpage> doi:\<pub-id pub-id-type="doi">10.1002/wea.3072\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT002"> \<label>2.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Fal\'ko\</surname> \<given-names>V.\</given-names> \</string-name> \<string-name> \<surname>Thomas\</surname> \<given-names>C.\</given-names> \</string-name> \</person-group> (\<year>2018\</year>) \<source>2D Materials: maintaining editorial quality\</source> \<volume>5\</volume>( \<issue>1\</issue>):\<fpage>010201\</fpage> doi:\<pub-id pub-id-type="doi">10.1088/2053-1583/aa9403\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT003"> \<label>3.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Parkin\</surname> \<given-names>S.\</given-names> \</string-name> \<string-name> \<surname>Chantrell\</surname> \<given-names>R.\</given-names> \</string-name> \<string-name> \<surname>Chang\</surname> \<given-names>C.\</given-names> \</string-name> \</person-group> (\<year>2018\</year>) \<source>Obituary: In Memoriam Professor Dr. Shoucheng Zhang, Consulting Editor\</source> \<volume>8\</volume>( \<issue>4\</issue>):\<fpage>1877001\</fpage> doi:\<pub-id pub-id-type="doi">10.1142/S2010324718770015\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT004"> \<label>4.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Dessauges-Zavadsky\</surname> \<given-names>M.\</given-names> \</string-name> \<string-name> \<surname>Pfenniger\</surname> \<given-names>D.\</given-names> \</string-name> \</person-group> (\<year>2018\</year>) \<source>Millimeter Astronomy\</source> \<volume>38\</volume> doi:\<pub-id pub-id-type="doi">10.1007/978-3-662-57546-8\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT005"> \<label>5.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Pustilnik\</surname> \<given-names>M.\</given-names> \</string-name> \<string-name> \<surname>van Heck\</surname> \<given-names>B.\</given-names> \</string-name> \<string-name> \<surname>Lutchyn\</surname> \<given-names>R.\</given-names> \</string-name> \<string-name> \<surname>Glazman\</surname> \<given-names>L.\</given-names> \</string-name> \</person-group> (\<year>2018\</year>) \<source>Erratum: Quantum Criticality in Resonant Andreev Conduction [Phys. Rev. Lett. 119, 116802 (2017)]\</source> \<volume>120\</volume>( \<issue>2\</issue>):\<fpage>029901\</fpage> doi:\<pub-id pub-id-type="doi">10.1103/PhysRevLett.120.029901\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT006"> \<label>6.\</label> \<mixed-citation publication-type="thesis"> \<person-group person-group-type="author"> \<string-name> \<surname>Carton\</surname> \<given-names>D.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>Ph.D. thesis\</source> doi:\<pub-id pub-id-type="doi">10.5281/zenodo.581221\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT007"> \<label>7.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Kohler\</surname> \<given-names>S.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>A 3D View of a Supernova Remnant\</source> \<fpage>2388\</fpage> \</mixed-citation> \</ref> \<ref id="CIT008"> \<label>8.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Green\</surname> \<given-names>D.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>Potential New Meteor Shower from Comet C/2015 D4 (Borisov)\</source> \<volume>4403\</volume> \<fpage>2\</fpage>. \</mixed-citation> \</ref> \<ref id="CIT009"> \<label>9.\</label> \<mixed-citation publication-type="software"> \<person-group person-group-type="author"> \<string-name> \<surname>Casey\</surname> \<given-names>A.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>sick: Spectroscopic inference crank\</source> \<fpage>ascl:1706.009\</fpage> \</mixed-citation> \</ref> \<ref id="CIT010"> \<label>10.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Siltala\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Jetsu\</surname> \<given-names>L.\</given-names> \</string-name> \<string-name> \<surname>Hackman\</surname> \<given-names>T.\</given-names> \</string-name> \<string-name> \<surname>Henry\</surname> \<given-names>G.\</given-names> \</string-name> \<string-name> \<surname>Immonen\</surname> \<given-names>L.\</given-names> \</string-name> \<string-name> \<surname>Kajatkari\</surname> \<given-names>P.\</given-names> \</string-name> \<string-name> \<surname>Lankinen\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Lehtinen\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Monira\</surname> \<given-names>S.\</given-names> \</string-name> \<string-name> \<surname>Nikbakhsh\</surname> \<given-names>S.\</given-names> \</string-name> \<string-name> \<surname>Viitanen\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Viuho\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Willamo\</surname> \<given-names>T.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>VizieR Online Data Catalog: BM CVn V-band differential light curve (Siltala+, 2017)\</source> \<fpage>J/AN/338/453\</fpage> \</mixed-citation> \</ref> \<ref id="CIT011"> \<label>11.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Waagen\</surname> \<given-names>E.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>V694 Mon (MWC 560) spectroscopy requested\</source> \<volume>429\</volume> \<fpage>1\</fpage>. \</mixed-citation> \</ref> \<ref id="CIT012"> \<label>12.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Yan\</surname> \<given-names>L.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>Confirm the Nature of a TDE Candidate in ULIRG F01004-2237 Using Spitzer mid-IR Light Curves\</source> \<fpage>13168\</fpage>. \</mixed-citation> \</ref> \<ref id="CIT013"> \<label>13.\</label> \<mixed-citation publication-type="thesis"> \<person-group person-group-type="author"> \<string-name> \<surname>Azankpo\</surname> \<given-names>S.\</given-names> \</string-name> \</person-group> (\<year>2017\</year>) \<source>Ph.D. thesis\</source> \<fpage>2\</fpage>. \</mixed-citation> \</ref> \<ref id="CIT014"> \<label>14.\</label> \<mixed-citation publication-type="report"> \<person-group person-group-type="author"> \<string-name> \<surname>Rotaru\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Pteancu\</surname> \<given-names>M.\</given-names> \</string-name> \<string-name> \<surname>Zaharia\</surname> \<given-names>C.\</given-names> \</string-name> \</person-group> (\<year>2016\</year>) \<article-title>The penumbral Moon\'s eclipse form 16 september 2016\</article-title>. \</mixed-citation> \</ref> \<ref id="CIT015"> \<label>15.\</label> \<mixed-citation publication-type="other"> \<person-group person-group-type="author"> \<string-name> \<surname>Velasco\</surname> \<given-names>S.\</given-names> \</string-name> \</person-group> (\<year>2016\</year>) \<source>Living on the edge: Adaptive Optics+Lucky Imaging\</source> \<fpage>872\</fpage>. \</mixed-citation> \</ref> \<ref id="CIT016"> \<label>16.\</label> \<mixed-citation publication-type="book"> \<person-group person-group-type="author"> \<string-name> \<surname>Liu\</surname> \<given-names>C.\</given-names> \</string-name> \<string-name> \<surname>Alekseyev\</surname> \<given-names>V.\</given-names> \</string-name> \<string-name> \<surname>Allwardt\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Bankovich\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Cade-Menun\</surname> \<given-names>B.\</given-names> \</string-name> \<string-name> \<surname>Davis\</surname> \<given-names>R.\</given-names> \</string-name> \<string-name> \<surname>Du\</surname> \<given-names>L.\</given-names> \</string-name> \<string-name> \<surname>Garcia\</surname> \<given-names>K.\</given-names> \</string-name> \<string-name> \<surname>Herschlag\</surname> \<given-names>D.\</given-names> \</string-name> \<string-name> \<surname>Khosla\</surname> \<given-names>C.\</given-names> \</string-name> \<string-name> \<surname>Kraut\</surname> \<given-names>D.\</given-names> \</string-name> \<string-name> \<surname>Li\</surname> \<given-names>Q.\</given-names> \</string-name> \<string-name> \<surname>Null\</surname> \<given-names>B.\</given-names> \</string-name> \<string-name> \<surname>Puglisi\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Sigala\</surname> \<given-names>P.\</given-names> \</string-name> \<string-name> \<surname>Stebbins\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Varani\</surname> \<given-names>L.\</given-names> \</string-name> \</person-group> (\<year>2009\</year>) \<title> \<italic>The Diversity of Nuclear Magnetic Resonance Spectroscopy\</italic> \</title>; \<publisher-name>Springer Netherlands\</publisher-name> \<person-group person-group-type="editor"> \<string-name> \<surname>Puglisi\</surname> \<given-names>J.\</given-names> \</string-name> \</person-group> \<role>Eds.\</role> \<fpage>65\</fpage> doi:\<pub-id pub-id-type="doi">10.1007/978-90-481-2368-1_5\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT017"> \<label>17.\</label> \<mixed-citation publication-type="journal"> \<person-group person-group-type="author"> \<string-name> \<surname>Mahabal\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Drake\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Djorgovski\</surname> \<given-names>S.\</given-names> \</string-name> \<string-name> \<surname>Donalek\</surname> \<given-names>C.\</given-names> \</string-name> \<string-name> \<surname>Glikman\</surname> \<given-names>E.\</given-names> \</string-name> \<string-name> \<surname>Graham\</surname> \<given-names>M.\</given-names> \</string-name> \<string-name> \<surname>Williams\</surname> \<given-names>R.\</given-names> \</string-name> \<string-name> \<surname>Baltay\</surname> \<given-names>C.\</given-names> \</string-name> \<string-name> \<surname>Rabinowitz\</surname> \<given-names>D.\</given-names> \</string-name> \<string-name> \<surname>PQ Team Caltech\</surname> \</string-name> \<string-name> \<surname>Yale\</surname> \</string-name> \<string-name> \<surname>NCSA\</surname> \</string-name> \<string-name> \<surname>Indiana\</surname> \</string-name> \<string-name> \<surname /> \<given-names>..\</given-names> \</string-name> \</person-group> (\<year>2007\</year>) \<article-title>Time Domain Exploration with the Palomar-QUEST Sky Survey\</article-title>. \<source> \<italic>American Astronomical Society Meeting Abstracts #210\</italic> \</source> \<volume>210\</volume> \<fpage>21.04\</fpage> \</mixed-citation> \</ref> \<ref id="CIT018"> \<label>18.\</label> \<mixed-citation publication-type="journal"> \<person-group person-group-type="author"> \<string-name> \<surname>.\</surname> \<given-names>S.\</given-names> \</string-name> \<string-name> \<surname>.\</surname> \<given-names>E.\</given-names> \</string-name> \</person-group> (\<year>2007\</year>) \<article-title>Analysis of Thermal Losses in the Flat-Plate Collector of a Thermosyphon Solar Water Heater\</article-title>. \<source> \<italic>Research Journal of Physics\</italic> \</source> \<volume>1\</volume>( \<issue>1\</issue>):\<fpage>35\</fpage> #x2013;\<lpage>41\</lpage> doi:\<pub-id pub-id-type="doi">10.3923/rjp.2007.35.41\</pub-id> \</mixed-citation> \</ref> \<ref id="CIT019"> \<label>19.\</label> \<mixed-citation publication-type="book"> \<person-group person-group-type="author"> \<string-name> \<surname>Miller\</surname> \<given-names>J.\</given-names> \</string-name> \</person-group> (\<year>1995\</year>) \<title> \<italic>Spacecraft navigation requirements\</italic> \</title> \<fpage>390\</fpage> #x2013;\<lpage>405\</lpage>. \</mixed-citation> \</ref> \<ref id="CIT020"> \<label>20.\</label> \<mixed-citation publication-type="book"> \<person-group person-group-type="author"> \<string-name> \<surname>Nayfeh\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Balachandran\</surname> \<given-names>B.\</given-names> \</string-name> \</person-group> (\<year>1995\</year>) \<title> \<italic>Applied nonlinear dynamics: analytical, computational and experimental methods\</italic> \</title> \</mixed-citation> \</ref> \<ref id="CIT021"> \<label>21.\</label> \<mixed-citation publication-type="journal"> \<person-group person-group-type="author"> \<string-name> \<surname>Ginsparg\</surname> \<given-names>P.\</given-names> \</string-name> \</person-group> (\<year>1988\</year>) \<article-title>Applied Conformal Field Theory\</article-title>. \<source> \<italic>arXiv e-prints\</italic> \</source> \<fpage>hep-th/9108028\</fpage> \</mixed-citation> \</ref> \<ref id="CIT022"> \<label>22.\</label> \<mixed-citation publication-type="confproc"> \<person-group person-group-type="author"> \<string-name> \<surname>Khatib\</surname> \<given-names>A.\</given-names> \</string-name> \<string-name> \<surname>Ellis\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>French\</surname> \<given-names>J.\</given-names> \</string-name> \<string-name> \<surname>Null\</surname> \<given-names>G.\</given-names> \</string-name> \<string-name> \<surname>Yunck\</surname> \<given-names>T.\</given-names> \</string-name> \<string-name> \<surname>Wu\</surname> \<given-names>S.\</given-names> \</string-name> \</person-group> (\<year>1983\</year>) \<article-title>Autonomous navigation using lunar beacons\</article-title>, \</mixed-citation> \</ref> \</records>'}

aaccomazzi commented 1 year ago

Hi Golnaz, thanks but what we need is to encode this info in the article-level metadata (see for instance example here: https://typeset.io/resources/jats-xml-everything-a-publisher-needs-to-know/). What you have above is the reference list instead. Thanks.

golnazads commented 1 year ago

Sorry. My understanding was that just like the other formats we are exporting citations. And the details I wrote only point to that.

I shall check out the link you provided and wait until we can chat about it further. If I understand clearly from your message, that we are going to export more than citation, then I purpose to create a new service.

On Fri, Apr 21, 2023 at 5:06 PM Alberto Accomazzi @.***> wrote:

Hi Golnaz, thanks but what we need is to encode this info in the article-level metadata (see for instance example here: https://typeset.io/resources/jats-xml-everything-a-publisher-needs-to-know/). What you have above is the reference list instead. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/adsabs/export_service/issues/233#issuecomment-1518335693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CHWSYU3Z7ZDFF46PWTXCLZEHANCNFSM6AAAAAAXEVNLJM . You are receiving this because you were mentioned.Message ID: @.***>

aaccomazzi commented 1 year ago

The pandoc jats document you linked above generates the right article metadata (https://github.com/mfenner/pandoc-jats)

golnazads commented 1 year ago

Yes, I found this when we were talking and then afterward I found this References :: JATS Guide (taylorandfrancis.com) https://jats.taylorandfrancis.com/jats-guide/topics/references/ which I thought was more relevant, exporting citation and creating the issue using the info there.

Now looking at the pandoc, does ADS actually contain these metadata. The challenge is to know what fields in solr go with what field in pandoc.

On Fri, Apr 21, 2023 at 5:28 PM Alberto Accomazzi @.***> wrote:

The pandoc jats document you linked above generates the right article metadata (https://github.com/mfenner/pandoc-jats)

— Reply to this email directly, view it on GitHub https://github.com/adsabs/export_service/issues/233#issuecomment-1518350947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3M4CEDFIBHZAE4JPRTENDXCL3XDANCNFSM6AAAAAAXEVNLJM . You are receiving this because you were mentioned.Message ID: @.***>

golnazads commented 1 year ago

@aaccomazzi 1- Please see the mapping between doctype and article type here https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/xmlFormat.py#L626. JATS article types for Journal Publishing format are listed here https://jats.nlm.nih.gov/publishing/tag-library/1.3/attribute/article-type.html. Not sure if I got them all right. 2- In the Journal Publishing specification, it does not say that article tag, which is the outter tag of JATS xml, can be repeated. But to support multiple bibcodes exported, I have gone with having article tag be repeated. Please let me know if that is OK? 3- For tag journal id I am displaying bibstem. In the specification page for Journal Publishing here https://jats.nlm.nih.gov/publishing/tag-library/1.3/element/journal-id.html, it says holds an external identifier, typically assigned to a journal by a publisher, archive, or library to provide a unique identifier for the journal. 4- For publisher, I have list of popular publishers that I gave to Carolyn, I have included them here for only JATS xml and check if any of those are in the pub_raw, I display it. 5- Endpoint is jatsxml, and shall be available in dev, if you would please check it out. 6- Below is the schematic, note that two tags of body and back that contained fulltext info and are optional are ignored.

\<?xml version='1.0' encoding='utf8'?> \<!--also similar to the other xml formats added the line below--> \<!--followed the same path formats as the other xml formats for these, and included the counts--> \<records xmlns="http://ads.harvard.edu/schema/abs/1.1/jats" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ads.harvard.edu/schema/abs/1.0/jats http://ads.harvard.edu/schema/abs/1.0/jats.xsd" retrieved="2" start="1" selected="2"> \<!--article is the outter tag and is repeated for multiple records, see comment #2--> \<!--for mapping doctype to jats type, see the comment #1--> \<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type=map doctype from solr to article_type> \<front> \<journal-meta> \<!--have assinged bibstem here please see comment #3--> \<journal-id journal-id-type="publisher">from solr bibstem\</journal-id> \<issn>from solr issn if available\</issn> \<publisher> \<publisher-name>for now extract from solr's pub_raw, see comment #4\</publisher-name> \</publisher> \</journal-meta> \<article-meta> \<article-id pub-id-type="doi">from solr doi if available\</article-id> \<title-group> \<article-title>from solr title\</article-title> \</title-group> \<contrib-group> \<contrib contrib-type="author"> \<name> \<surname>from solr last name\</surname> \<given-names>from solr given name\</given-names> \</name> \<aff>from solr aff if available\</aff> \<contrib-id contrib-id-type="orcid">from solr orchid id if available\</contrib-id> \</contrib> \<!--next author if any--> \</contrib-group> \<volume>from solr volume if available\</volume> \<issue>from solr issue if available\</issue> \<abstract>from solr abstract\</abstract> \<fpage>from solr page if available\</fpage> \<lpage>from solr last page if available\</lpage> \</article-meta> \</front> \<!-- two other tags, body and back are optional and so I ignored them --> \</article> \<!--next record starts here--> \</records>

aaccomazzi commented 1 year ago

Thanks, I have reviewed the update and I have some suggestions:

  1. Output the bibcode of each record as a persistent id in the following way: <article-id pub-id-type="bibcode">2022ApJS..260....5C</article-id>
  2. The first two lines of the output XML should be the following for compatibility with JATS:
    <!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "https://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
  3. we should drop the <records> wrapper when a single record is output, since this element is not part of the JATS DTD
  4. when multiple records are output, we can keep the <records> element in there as the third line in the output

There's more testing that I should do after this is implemented.

golnazads commented 1 year ago

@aaccomazzi made the above modifications, released a new version for dev, notified Sergi to deploy.