Add support for parsing all imzML metadata fields

This PR adds support for extracting metadata at two levels:

Non-spectrum metadata - all the non-spectrum-related header sections in an imzML file. These are converted to structured data via the Metadata class, and exposed via ImzMLParser.metadata.
Spectrum data - all the data inside individual <spectrum> elements. Due to performance, there are several options for parsing this data:
- (Default) Don't do any additional parsing, aside from extracting coordinates and ibd file addresses
- Extract specific fields based on a provided list of accession IDs. These are exposed in a dict-of-lists in ImzMLParser.spectrum_metadata_fields, which would have e.g. for TIC values (accession MS:1000285): {'MS:1000285': [123.456, 123.456, ...]}. This takes ~5% extra time per accession ID.
- Parse it fully into one SpectrumData class per spectrum. These are exposed via the list ImzMLParser.spectrum_full_metadata. This takes ~2.5x as much time as the default mode. The 2rd option is what I intend to use for METASPACE to extract TIC and injection time. I added the 3rd option for completeness, as this library exists for more than just METASPACE.

In order to provide a good interface to the data:

Metadata.__init__ and SpectrumData.__init__ destructure the XML element hierarchy defined by the mzML spec into Python objects, dicts and lists. There was an extremely common pattern to hold a collection of Controlled Vocabulary and user-defined values, which I implemented as ParamGroup.
The ontology/ms.py, etc. dumps of the .obo ontology files provide a mapping of accession IDs to spec-defined names and data types. This ensures that the dicts of Controlled Vocabulary values can be accessed consistently, even if implementations have fields with typos, etc.
I also included a low-level interface to ParamGroup, which is a lot less "lossy". I feel this was required to handle edge cases (e.g. multiple definitions with the same accession ID) and other use cases (e.g. retrieving units).

There are several missing features of this implementation:

There's no sane way to handle enum-style cvParams, e.g. a positive scan is indicated by the existence of a MS:1000129 negative scan parameter with no value. Ideally this would be retrievable as a value of the MS:1000465 scan polarity parameter (which is never explicitly used). I.e. at the moment you have to ask "is this positive mode? is this negative mode? etc." for ever possible enum value, when it would be preferable to have one field so that you could ask "what mode is it?". Frankly, it was too time-consuming to get a useful list of these relationships out of the ontology data due to inconsistencies about how the relationships were defined.
XML attributes on a ParamGroup aren't normalized at all. These attributes are explicitly defined in the spec, so ideally you'd be able to do e.g. source_file.location instead of source_file.attrs.get('location'), but it seemed too low value / high cost to implement. Also, unlike subelements, XML attributes aren't converted from camelCase to snake_case.
The units that can be added to parameters aren't exposed in any useful high-level manner. This is a rabbit hole I'd prefer not to explore until I have a solid need for it.

See tests/test_basic.py for examples of usage with the new API. There is also a Metadata.pretty() function which dumps a human-readable JSON-ifiable dict of the data, with output like this: https://gist.github.com/LachlanStuart/343f2b42815a3c64e15308a200ab91c9

In addition to the above changes, I updated the project metadata a bit, including dropping support for Python 2.7, because the code was already using language features not available in 2.7 and nobody has complained.

For testing, I ran this against ~30 imzML files from various sources as a stability check to ensure there were no crashes introduced by the metadata parsing. I'm not 100% confident that all the mappings were done correctly - the checks in tests/test_basic.py were limited by which fields/sections were available in the test datasets. I only did spot checks on a couple other fields from other datasets.

@intsco Please review the latest 3 commits. I checked every imzML file in our main upload bucket and found there were 27 imzML files that had mismatched accession/name values for the datatype of the m/z and intensity arrays. This caused the code in this PR to read corrupt data and usually crash due to reading past the end of the .ibd file. Specifically, these were due to an early version of ImzMLWriter from this library, and an early version of Xcalibur.

I suspect many datasets will generate warnings. It'll be spammy, but I'd prefer warnings over hard-to-debug issues... I tested affected datasets from ImzMLWriter and Xcalibur, and got this output:

> p = ImzMLParser('/home/lachlan/Documents/datasets/Untreated_3_434.imzML')                                           
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:81: UserWarning: Accession MS:1000523 found with incorrect name "32-bit float" (expected "64-bit float"). This is a known issue with some imzML conversion software - updating accession to MS:1000521.
  'to %s.' % (accession, raw_name, name, fixed_accession)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:73: UserWarning: Unrecognized accession in <cvParam>: MS:xxx (name: "pyimzml").
  warn('Unrecognized accession in <cvParam>: %s (name: "%s").' % (accession, raw_name))

> p = ImzMLParser('/home/lachlan/data/old_xcalibur_dataset.imzML', parse_lib='ElementTree')                                                                                                      
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession MS:1000563 found with incorrect name "Thermo RAW file". Updating name to "Thermo RAW format".
  % (accession, raw_name, name)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession MS:1000590 found with incorrect name "contact organization". Updating name to "contact affiliation".
  % (accession, raw_name, name)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:81: UserWarning: Accession MS:1000521 found with incorrect name "64-bit float" (expected "32-bit float"). This is a known issue with some imzML conversion software - updating accession to MS:1000523.
  'to %s.' % (accession, raw_name, name, fixed_accession)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession IMS:1000042 found with incorrect name "max count of pixel x". Updating name to "max count of pixels x".
  % (accession, raw_name, name)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession IMS:1000043 found with incorrect name "max count of pixel y". Updating name to "max count of pixels y".
  % (accession, raw_name, name)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession IMS:1000046 found with incorrect name "pixel size x". Updating name to "pixel size (x)".
  % (accession, raw_name, name)
/home/lachlan/dev/pyimzML/pyimzml/ontology/ontology.py:88: UserWarning: Accession MS:1000838 found with incorrect name "sprayed". Updating name to "sprayed MALDI matrix preparation".
  % (accession, raw_name, name)

I also found that the ontology dumps excluded obsolete terms. This generated spurious warnings as some terms were present in the above files (such as MS:1000843 wavelength), so I re-dumped the ontologies with these obsolete terms included.

alexandrovteam / pyimzML

Add support for parsing all imzML metadata fields #21