IAU-ADES / ADES-Master

ADES implementation based on a master XML file
26 stars 7 forks source link

01-Sep-2023 - Modified python code in Python/bin

The python script have now the .py suffix in the name

   # The initial 'python' statement has been removed at the beginning of each script
   # More tests have been added in the new_tests directory (the tests can be run using 'pytest')
   # The scripts in Python/bin can now be called as scripts or as routine inside your python code

25-Apr-2022 - Incremented ADES version from v2017 to v2022

03-Feb-2022 - Added a few new fields and other minor revisions

Added shapeOcc, obsSubID and trkMPC elements.

   # obsID can be up to 25 alphanumeric characters
   # Minor typographical and layout corrections

15-Jan-2019 - Some changes to the schema were made to reflect historical data

PermIDType for permID needs to accept 1I' and any moreI' objects

   # ProvIDType should restrict to P-L, T-1, T-2 and T-3 only and not allow T-L or P-3
   # CatType for astCat and photCat needs to accept the '.' character (e.g., GSC1.2)
   # ObsIDType for obsID should allow up to 25 characters
   # TrkIDType for trkID should allow the hyphen `-`
   # TrkSubType for trkSub should allow the hyphen `-`
   # (Not for submissions) TrkSubType for trkSub should allow these characters: "/",
   "\", "(", ")", "@", "?", ".", "+"
   # (Not for submissions) ProvIDType for provID should allow pre-1925 values of the
   form "A902 AA"
   # TimePrecType for precTime should allow additional values (prec not in submissions)
        41667 (integer hours)
        4167 (tenths of an hour)
        694 (integer minutes)
        69 (tenths of a minute)
   # Expand length of remarks to 300 characters

13-Jul-2018 - Minor fixes were applied to the documentation and schema. See ADES_Description.pdf for details.

CONTENTS: xml/ The adesmaster.xml file lives here. This is not the place for example xml files

           adesmaster.xml

           The adesmaster.xml file is transformed by various .xlst
           files into .xsd files and .tex files fo ps and pdf documentation

 xslt/util/  location for xslt files used by the /bin files as helpers
      Currently only has adestables.xslt

      adestables.xlst

 xslt/xsd/   location for xslt files used to create xsd files.  I didn't
             include the xsd files themselves since they can be made
             with applyxslt.py.  They'd go in a top-level xsd/
             directory anyway

      distribhumanxsd.xslt #currently not used
      distribxsd.xslt #currently not used
      generalhumanxsd.xslt #currently not used
      generalxsd.xslt
      submithumanxsd.xslt #currently not used
      submitxsd.xslt

 xslt/latex/  Location for xslt files used to translate adesmaster.xml
              into latex input.

      docades.xslt
      docelementstable.xslt
      docgrouptypestable.xslt
      docsimpletypestable.xslt

 tests/    Location of test files.   It has its own README.
           The runtests script must be run when in the 
           tests/ directory -- it creates some extra dirs
           and knows about the sub-directories.

 xsd/    Contains generated xsd files and makexsdfiles

      makexsdfiles generates xsd files if run in this directory

      Currently only submit.xsd and general.xsd are needed

doc/ contains pdf and ps files documenting ADES tables
      ades.ps  # generated ades documenation file
      ades.pdf # generated ades documenation file
      docsrc contains code to build these in latex.  It uses
             xslt to generate the tex files from adesmaster.  
             You'll need to edit the makedoc file to point to 
             latex your tex installation.

             ./makedoc will generate ades.ps and ades.pdf
                       in this directory.   Copy those to doc/
                       to update the documentation if adesmaster.xml
                       or the xslt files have changed.

             ./cleanum removes the evidence since the latex temp
                       files should not be in github.

 There are example programs demontrating how to read
 and write xml files using lxml in 

 Fortran/readxmlfox.f90
 Fortran/writexmlfox.f90
 C/src/readxmlc.c
 C/src/writexmlc.
 Python/bin/readxmlpy
 Python/bin/writexmlpy

 These all use the xml library.  

 Python: install lxml
 C: make sure liblxml2 is available
 Fortran: install FoX

 The Python and FoX libarires use liblxml2

INSTALLATION and PREREQUISITES: Untar this tarball.

  Python: Ensure you have a correctly installed
  python 2 or 3 and know its path.  You can have both.  

     You'll have to install the python lxml module for 
     your python separately; the best way to do that is 
     to build from source using a compatible C compiler.  
     See google for instructions, which change regularly.

     Alternatively, install Python package requirements
     using pip: 
     $ python -m pip install -r ./Python/requirements.txt 

  C: Ensure you have a correctly installed C/C++ compiler 
  and you know its path.  

     You will need liblxml2.a and liblxml2.so, which normally 
     come installed as prt of the compiler installation.  If 
     not, you'll need to obtain and install this library

  Fortran: Ensure you have a correctly installed Fortran
  compiler and you know its path

     You will need to install FoX, a Fortran XML library 
     (or something similar).  This is available (it has 
     a FreeBSD-like license) from:

     https://github.com/andreww/fox

     You'll retrieve fox-master.zip.  Unzip that into
     the Fortran directory

BUILD C Examples:

  To build the C programs, go to the C/ directory, configure
  to build Makefile.config, and then cd into src and type 'make'.
  The README file in C/ has more details.  If you're on a MAC OS X,
  you'll need to read it since the instructions are different.

BUILD Fortran Examples:

  First, build FoX.  Go to the fox-master directory and
  run the ./configure, which may pick up the wrong 
  fortran.  If it does, edit the "configure" file and
  edit the two lines containing "gfortran" so that your
  Fortran compiler is *first* in the list.  The make
  sure your Fortran compiler in in you PATH and run
  ./configure again.

  The run "make" and "make check" to build FoX.  Documentation
  for FoX is in FoX/DoX as html.

  After that, go to the Fortran directory and run "make" to 
  build writexmlf90 and readxmlf90 using FoX.

USAGE:

The following are the main executables available from Python. All of these work in python 2 and 3 although they pick /usr/bin/env python if run as commands.

These require the Python lxml library, available both for Python 2 and 3

adestest/Python/bin/

   psvtoxml <psvfile> <xmlfile>  # converts psv file to xml file
   xmltopsv <xmlfile> <psvfile>  # converts xml file to psv file

   # the mpc80col converters are incomplete.  They do not translate
   # header records or Satellite observations.
   mpc80coltoxml <mpc80colfile> <xml file>
   xmltompc80col <xmlfile> <mpc80colfile>  

   valall <xml file>     # validates against all possible formats
                         #    using both human-readable and non-   
                         #    human-readable xslt-generated xsd files
   valsubmit <xml file>  # validates against submit format
   valgeneral <xml file> # validates against general format

   applyxslt      # <xml file> <xslt file>  > <output file>
      # example to create the submit schema
      Python/bin/applyxslt xml/adesmaster.xml xslt/xsd/submitxsd.xslt > submit.xsd

   writexml       # example script to write xml file

There is code in C for the all of the above except mpc80coltoxml and xmltompc80col, in adestest/C/src. To build it, run "./configure" "cd src; make".
If your are on a Mac, source the forMacOS... file first before running configure.

    mpc80coltoxml and xmltompc80col are not yet in C, but the above
    programs all work the same way.

TEST CASES:

  The "adestest/tests" directory contains numerous correct and incorrect 
  test cases.  To run them, "cd tests" and run 

  .runtests prog_python2   # to test python 2
  .runtests prog_python3   # to test python 3, if python3 is in your path
  .runtests prog_c         # to test in C, if you built the C

  Also, the tests/mpc/ directory has some mpc 80-column examples.  The
  test cases for these are not yet finished

DOCUMENTATION:

  adestest/doc/ contains pdf and ps files documenting ADES tables
  adestest/doc/src contains code to build these in latex.  It uses
                   xslt to generate the tex files.  You'll need to
                   edit the makedoc file to point to your tex
                   installation.

These are the README file for some previous distribution tests. Some of the information may be useful but some may be obsolete.

2016 Dec GMH --- older notes This is a not-quite-ready-for-prime-time attempt at a distribution.

Known Issues: 1) xmltopsv produces different header orders on different systems for the headers whose order is not specified. This round-trips OK but shows diffs in the tests. I'm not sure what the right order should be.

2) The WINDOWS-1252 codec is broken on some systems in the library

3) Different xml libraries use ' or " for attribute quoting of the <? xml version="1.0" or '1.0' line. This is fine and legal but makes testing hard. Other legal differences are possible

Specific distribution notes: 1) This uses the python lxml module, which is not part of the default python. There are numerous clever ways to try to do binary installs but the most reliable thing to do is obtain a source tarball (such as lxml-3.6.4.tar.gx) and run "python setup build" and make and so forth on your machine. Just Google "python packages lxml" and poke around untill you find the source tarball.
This is important because all the web sites try to help you by guessing what your configuration is, and they guess wrong all the time. Find the source tarball and go from that. This is especially important if your want to make both a python 2 and python 3 installation.

2) The runtests script source's a script for picking up the executables it uses. This makes it easy to test your own executables

Several issues remain:

The tests are imcomplete. You can help by expanding them :-)

The runtests point out that between python2, python3 and C there is a disagreement about the order of fields in PSV. The ones we specifiy are all fine, but the order of extra ones can be arbitrary. All the files round-trip just fine, so this may only be a problem for testing.

xmlUTF8Strlen does not return the width of a unicode string but rather just the number of unicode characters (I think it handles the combining characters correctly). This means padding to achieve justification in Chinese etc. will be wrong.

NOTE: although the maximum allowed field width is 200, that means 200 unicode characters. This may even be longer than 200 unicode code points because of combining characters. Python handles memory management properly; in C you're on your own.

Usage: The executables in the varous bin/ directories (should) have the same interface. To run tests, go to the tests directory and run ./runtests prog_python2 ./runtests prog_python3 ./runtests prog_c

Run these into a file since the output can be long.

prog_python2 assumes #!/usr/bin/env python is python 2.7 prog_python3 needs to point to your python3 not mine prog_c script uses python for the encoding check. xmltopsv and psvtoxml are in C. Note that the C code my version seems to use single quotes instead of double quotes on the version line <?xml version="1.0" encoding="UTF-8"?> vs. <?xml version='1.0' encoding='UTF-8'?>

             This confuses diff.  The attributes in the
             doc are coded the same way.  Notice the EBCDIC
             and UTF-7 encodings are fine, but the quote
             differences make them look different.

Notes:

For now, all the executables start by transforming the xml/adesmaster.xml file into the internal tables using xslt/util/tableades.xslt. This is hard-coded into the executables. Eventually we may want to have the tables hard-coded into the executables instead once things stabilize.

For now, all the xsd files are generated from adesmaster.xml using xslt/xsd/xsd.xslt files. We could create external xsd files once we know what the final format will be.

Those two above items add surprisingly little to program start overhead.

Everything works by converting input files, including input files, into an internal xml etree and doing operations on that. We may want to use iterparse to handle large files but so far this is not an issue. I'm not sure what large means.

It's really important for performance to not have memory leaks. Memory management is tested with the C executables through some commented-out code using the "nMemoryTest"

define in ades.h.


This directory has several sub-directories:

C/

./configure creates Makefile.config. cd src; make clean; make # builds and puts executables in bin cd src; make realclean; # removes executables from bin

README configure.ac configure install.sh # what a mess aclocal.m4 # yup, a mess forMacOSXwithout_pkg_config # did I say a mess Makefile.config.in src/ # make puts executables in bin include/ bin/ # same interface as Python. At least they're supposed to :=) Executables: psvtoxml # psvtoxml xmltopsv # xmltopsv valall # valall valades # see tests/runtests unittest # this is woefully incomplete writexml # writexml myfile

          The encoding flags for PSV files do not work.  They
          always assume the PSV encoding is UTF-8

Python bin/ python executable files and modules. The modules are not executable and are in bin because I didn't want to bother with setting pythonpath yet.

       All the python scripts are good with python2 and python3
           <script>   # runs a script with #!/usr/bin/env python
           <python2> <script>   # runs a script with python2
           <python3> <script>   # runs a script with python3

           Python/bin/xmltopsv <args>
           python xmltopsv <args>
           python3 xmltopsv <args>

       Executables:
          applyxlst
          validate
          encoding

          psvtoxml  # psvtoxml <psv file> <xml file>
          xmltopsv  # xmltopsv <xml file> <psv files>
          valall    # valall <xml file>
          valades   # see tests/runtests
          unittest  # this is woefully incomplete
          writexml  # writexml myfile

writexml myfile works in both Python and C++.
The C and Python conversions don't match, at least on my machine, because one of the says <?xml version='1.0' encoding='UTF-8'?> and the other <?xml version="1.0" encoding="UTF-8"?> Both of these are legal.

"writexml myfile UTF-7" is interesting.


Some other thoughts:

A) Use iterparse to process documents as a stream

Both the Python and C work on xml documents, which mean the entire input is in memory as an xml tree (even psv input is converted to an xml tree.

Larger documents may require an iterparse structure.

B) User interface

Right now I don't have much for this.   The basic idea
is to use xml documents for everything an supply routines
to walk through them.   

To make a new document, build an xml document, 
validate it, and then write it either as xml or psv.

To read a document, read it into an xml tree and
use methods on the tree.

Obviously we can build a layer on top of this but I haven't
given that much work yet.  I think it is not a good idea
to make a big struct of xmlChar* pointers, since that's 
going to 

  1) be a recipe for memory leaks
  2) be slow because it's mostly going to be empty

I think going through the node interface by strings is better,
In C++ and Python that's easy.  In C and Fortran this is harder
but I think we should be dealing with the xml directly or 
indirectly (but conceptually)  in all cases.

C) Unicode handling

-> Use native UTF-8 whenever possible

Note python3 will not write UTF-8 to stdout unless the right environment variables are set. This is going to be a bigger problem in the future. While C/C++ will write bytes, having improper terminal settings can create surprises.

  Recommendation:  Transform from file to file.  View files
                   with an editor that supports utf-8 or
                   use file:// on you web browser, which 
                  is happy with utf-8.