01-Sep-2023 - Modified python code in Python/bin
# The initial 'python' statement has been removed at the beginning of each script
# More tests have been added in the new_tests directory (the tests can be run using 'pytest')
# The scripts in Python/bin can now be called as scripts or as routine inside your python code
25-Apr-2022 - Incremented ADES version from v2017 to v2022
03-Feb-2022 - Added a few new fields and other minor revisions
# obsID can be up to 25 alphanumeric characters
# Minor typographical and layout corrections
15-Jan-2019 - Some changes to the schema were made to reflect historical data
1I' and any more
I' objects # ProvIDType should restrict to P-L, T-1, T-2 and T-3 only and not allow T-L or P-3
# CatType for astCat and photCat needs to accept the '.' character (e.g., GSC1.2)
# ObsIDType for obsID should allow up to 25 characters
# TrkIDType for trkID should allow the hyphen `-`
# TrkSubType for trkSub should allow the hyphen `-`
# (Not for submissions) TrkSubType for trkSub should allow these characters: "/",
"\", "(", ")", "@", "?", ".", "+"
# (Not for submissions) ProvIDType for provID should allow pre-1925 values of the
form "A902 AA"
# TimePrecType for precTime should allow additional values (prec not in submissions)
41667 (integer hours)
4167 (tenths of an hour)
694 (integer minutes)
69 (tenths of a minute)
# Expand length of remarks to 300 characters
13-Jul-2018 - Minor fixes were applied to the documentation and schema. See ADES_Description.pdf for details.
CONTENTS: xml/ The adesmaster.xml file lives here. This is not the place for example xml files
adesmaster.xml
The adesmaster.xml file is transformed by various .xlst
files into .xsd files and .tex files fo ps and pdf documentation
xslt/util/ location for xslt files used by the /bin files as helpers
Currently only has adestables.xslt
adestables.xlst
xslt/xsd/ location for xslt files used to create xsd files. I didn't
include the xsd files themselves since they can be made
with applyxslt.py. They'd go in a top-level xsd/
directory anyway
distribhumanxsd.xslt #currently not used
distribxsd.xslt #currently not used
generalhumanxsd.xslt #currently not used
generalxsd.xslt
submithumanxsd.xslt #currently not used
submitxsd.xslt
xslt/latex/ Location for xslt files used to translate adesmaster.xml
into latex input.
docades.xslt
docelementstable.xslt
docgrouptypestable.xslt
docsimpletypestable.xslt
tests/ Location of test files. It has its own README.
The runtests script must be run when in the
tests/ directory -- it creates some extra dirs
and knows about the sub-directories.
xsd/ Contains generated xsd files and makexsdfiles
makexsdfiles generates xsd files if run in this directory
Currently only submit.xsd and general.xsd are needed
doc/ contains pdf and ps files documenting ADES tables
ades.ps # generated ades documenation file
ades.pdf # generated ades documenation file
docsrc contains code to build these in latex. It uses
xslt to generate the tex files from adesmaster.
You'll need to edit the makedoc file to point to
latex your tex installation.
./makedoc will generate ades.ps and ades.pdf
in this directory. Copy those to doc/
to update the documentation if adesmaster.xml
or the xslt files have changed.
./cleanum removes the evidence since the latex temp
files should not be in github.
There are example programs demontrating how to read
and write xml files using lxml in
Fortran/readxmlfox.f90
Fortran/writexmlfox.f90
C/src/readxmlc.c
C/src/writexmlc.
Python/bin/readxmlpy
Python/bin/writexmlpy
These all use the xml library.
Python: install lxml
C: make sure liblxml2 is available
Fortran: install FoX
The Python and FoX libarires use liblxml2
INSTALLATION and PREREQUISITES: Untar this tarball.
Python: Ensure you have a correctly installed
python 2 or 3 and know its path. You can have both.
You'll have to install the python lxml module for
your python separately; the best way to do that is
to build from source using a compatible C compiler.
See google for instructions, which change regularly.
Alternatively, install Python package requirements
using pip:
$ python -m pip install -r ./Python/requirements.txt
C: Ensure you have a correctly installed C/C++ compiler
and you know its path.
You will need liblxml2.a and liblxml2.so, which normally
come installed as prt of the compiler installation. If
not, you'll need to obtain and install this library
Fortran: Ensure you have a correctly installed Fortran
compiler and you know its path
You will need to install FoX, a Fortran XML library
(or something similar). This is available (it has
a FreeBSD-like license) from:
https://github.com/andreww/fox
You'll retrieve fox-master.zip. Unzip that into
the Fortran directory
BUILD C Examples:
To build the C programs, go to the C/ directory, configure
to build Makefile.config, and then cd into src and type 'make'.
The README file in C/ has more details. If you're on a MAC OS X,
you'll need to read it since the instructions are different.
BUILD Fortran Examples:
First, build FoX. Go to the fox-master directory and
run the ./configure, which may pick up the wrong
fortran. If it does, edit the "configure" file and
edit the two lines containing "gfortran" so that your
Fortran compiler is *first* in the list. The make
sure your Fortran compiler in in you PATH and run
./configure again.
The run "make" and "make check" to build FoX. Documentation
for FoX is in FoX/DoX as html.
After that, go to the Fortran directory and run "make" to
build writexmlf90 and readxmlf90 using FoX.
USAGE:
The following are the main executables available from Python. All of these work in python 2 and 3 although they pick /usr/bin/env python if run as commands.
These require the Python lxml library, available both for Python 2 and 3
adestest/Python/bin/
psvtoxml <psvfile> <xmlfile> # converts psv file to xml file
xmltopsv <xmlfile> <psvfile> # converts xml file to psv file
# the mpc80col converters are incomplete. They do not translate
# header records or Satellite observations.
mpc80coltoxml <mpc80colfile> <xml file>
xmltompc80col <xmlfile> <mpc80colfile>
valall <xml file> # validates against all possible formats
# using both human-readable and non-
# human-readable xslt-generated xsd files
valsubmit <xml file> # validates against submit format
valgeneral <xml file> # validates against general format
applyxslt # <xml file> <xslt file> > <output file>
# example to create the submit schema
Python/bin/applyxslt xml/adesmaster.xml xslt/xsd/submitxsd.xslt > submit.xsd
writexml # example script to write xml file
There is code in C for the all of the above except mpc80coltoxml and
xmltompc80col, in adestest/C/src. To build it,
run "./configure" "cd src; make".
If your are on a Mac, source the forMacOS... file first before running
configure.
mpc80coltoxml and xmltompc80col are not yet in C, but the above
programs all work the same way.
TEST CASES:
The "adestest/tests" directory contains numerous correct and incorrect
test cases. To run them, "cd tests" and run
.runtests prog_python2 # to test python 2
.runtests prog_python3 # to test python 3, if python3 is in your path
.runtests prog_c # to test in C, if you built the C
Also, the tests/mpc/ directory has some mpc 80-column examples. The
test cases for these are not yet finished
DOCUMENTATION:
adestest/doc/ contains pdf and ps files documenting ADES tables
adestest/doc/src contains code to build these in latex. It uses
xslt to generate the tex files. You'll need to
edit the makedoc file to point to your tex
installation.
These are the README file for some previous distribution tests. Some of the information may be useful but some may be obsolete.
2016 Dec GMH --- older notes This is a not-quite-ready-for-prime-time attempt at a distribution.
Known Issues: 1) xmltopsv produces different header orders on different systems for the headers whose order is not specified. This round-trips OK but shows diffs in the tests. I'm not sure what the right order should be.
2) The WINDOWS-1252 codec is broken on some systems in the library
3) Different xml libraries use ' or " for attribute quoting of the <? xml version="1.0" or '1.0' line. This is fine and legal but makes testing hard. Other legal differences are possible
4) I've decided to make the main interface the DOM and not some C struct. This is mainly because most use cases fill less than half of the struct and memory management is tricky.
I've written an example program (writexml) in both C and Python for writing a new xml file using the ElementStack interface. I don't have a design for reading yet but we need to know what we want.
5) This code words on complete documents. Using SAX/iterparse for large files is possible with pretty much the same interface.
6) The timings are dominated by program launch times for the 100-item examples. I'm not sure how much performance is needed.
7) The code needs some organization. I wanted to put out something working.
Specific distribution notes:
1) This uses the python lxml module, which is not part of
the default python. There are numerous clever ways to
try to do binary installs but the most reliable thing
to do is obtain a source tarball (such as lxml-3.6.4.tar.gx)
and run "python setup build" and make and so forth on
your machine. Just Google "python packages lxml" and
poke around untill you find the source tarball.
This is important because all the web sites try
to help you by guessing what your configuration is,
and they guess wrong all the time. Find the source tarball
and go from that. This is especially important if your
want to make both a python 2 and python 3 installation.
2) The runtests script source's a script for picking up the executables it uses. This makes it easy to test your own executables
Several issues remain:
The tests are imcomplete. You can help by expanding them :-)
The runtests point out that between python2, python3 and C there is a disagreement about the order of fields in PSV. The ones we specifiy are all fine, but the order of extra ones can be arbitrary. All the files round-trip just fine, so this may only be a problem for testing.
xmlUTF8Strlen does not return the width of a unicode string but rather just the number of unicode characters (I think it handles the combining characters correctly). This means padding to achieve justification in Chinese etc. will be wrong.
NOTE: although the maximum allowed field width is 200, that means 200 unicode characters. This may even be longer than 200 unicode code points because of combining characters. Python handles memory management properly; in C you're on your own.
Usage: The executables in the varous bin/ directories (should) have the same interface. To run tests, go to the tests directory and run ./runtests prog_python2 ./runtests prog_python3 ./runtests prog_c
Run these into a file since the output can be long.
prog_python2 assumes #!/usr/bin/env python is python 2.7 prog_python3 needs to point to your python3 not mine prog_c script uses python for the encoding check. xmltopsv and psvtoxml are in C. Note that the C code my version seems to use single quotes instead of double quotes on the version line <?xml version="1.0" encoding="UTF-8"?> vs. <?xml version='1.0' encoding='UTF-8'?>
This confuses diff. The attributes in the
doc are coded the same way. Notice the EBCDIC
and UTF-7 encodings are fine, but the quote
differences make them look different.
Notes:
For now, all the executables start by transforming the xml/adesmaster.xml file into the internal tables using xslt/util/tableades.xslt. This is hard-coded into the executables. Eventually we may want to have the tables hard-coded into the executables instead once things stabilize.
For now, all the xsd files are generated from adesmaster.xml
using xslt/xsd/
Those two above items add surprisingly little to program start overhead.
Everything works by converting input files, including input files, into an internal xml etree and doing operations on that. We may want to use iterparse to handle large files but so far this is not an issue. I'm not sure what large means.
It's really important for performance to not have memory leaks. Memory management is tested with the C executables through some commented-out code using the "nMemoryTest"
This directory has several sub-directories:
C/
./configure creates Makefile.config. cd src; make clean; make # builds and puts executables in bin cd src; make realclean; # removes executables from bin
README
configure.ac
configure
install.sh # what a mess
aclocal.m4 # yup, a mess
forMacOSXwithout_pkg_config # did I say a mess
Makefile.config.in
src/ # make puts executables in bin
include/
bin/ # same interface as Python. At least they're supposed to :=)
Executables:
psvtoxml # psvtoxml
The encoding flags for PSV files do not work. They
always assume the PSV encoding is UTF-8
Python bin/ python executable files and modules. The modules are not executable and are in bin because I didn't want to bother with setting pythonpath yet.
All the python scripts are good with python2 and python3
<script> # runs a script with #!/usr/bin/env python
<python2> <script> # runs a script with python2
<python3> <script> # runs a script with python3
Python/bin/xmltopsv <args>
python xmltopsv <args>
python3 xmltopsv <args>
Executables:
applyxlst
validate
encoding
psvtoxml # psvtoxml <psv file> <xml file>
xmltopsv # xmltopsv <xml file> <psv files>
valall # valall <xml file>
valades # see tests/runtests
unittest # this is woefully incomplete
writexml # writexml myfile
writexml myfile
The C and Python conversions don't match, at least on my
machine, because one of the says
<?xml version='1.0' encoding='UTF-8'?>
and the other
<?xml version="1.0" encoding="UTF-8"?>
Both of these are legal.
"writexml myfile UTF-7" is interesting.
Some other thoughts:
A) Use iterparse to process documents as a stream
Both the Python and C work on xml documents, which mean the entire input is in memory as an xml tree (even psv input is converted to an xml tree.
Larger documents may require an iterparse structure.
B) User interface
Right now I don't have much for this. The basic idea
is to use xml documents for everything an supply routines
to walk through them.
To make a new document, build an xml document,
validate it, and then write it either as xml or psv.
To read a document, read it into an xml tree and
use methods on the tree.
Obviously we can build a layer on top of this but I haven't
given that much work yet. I think it is not a good idea
to make a big struct of xmlChar* pointers, since that's
going to
1) be a recipe for memory leaks
2) be slow because it's mostly going to be empty
I think going through the node interface by strings is better,
In C++ and Python that's easy. In C and Fortran this is harder
but I think we should be dealing with the xml directly or
indirectly (but conceptually) in all cases.
C) Unicode handling
-> Use native UTF-8 whenever possible
Note python3 will not write UTF-8 to stdout unless the right environment variables are set. This is going to be a bigger problem in the future. While C/C++ will write bytes, having improper terminal settings can create surprises.
Recommendation: Transform from file to file. View files
with an editor that supports utf-8 or
use file:// on you web browser, which
is happy with utf-8.