Currently, the reader implements a three-layer structure:
there is the XpsReader(BaseReader)
then there's a layer called "mappers" (see here for an example)
these mappers themselves actually call "parsers" (example).
The mappers are used for one file format (like the sle format from SPECS) and then I have a logic that calls a parser for a specific subsets of such files, e.g. depending on the software version that was used for this file. That allows me to keep functionality across different, yet very similar versions of a format by inheritance and abstract base classes.
The file extension is often not unique, i.e., many vendors have a .txt export, but all the files are actually different. But this logic could probably be handled by passing a function in the extensions dict of the MultiFormatReader that does this. This is already being handled similary. And finally, there should be be a check that the file comes from the list of supported vendors.
Currently, the reader implements a three-layer structure:
The mappers are used for one file format (like the sle format from SPECS) and then I have a logic that calls a parser for a specific subsets of such files, e.g. depending on the software version that was used for this file. That allows me to keep functionality across different, yet very similar versions of a format by inheritance and abstract base classes.
All of those sub-classes could be readers themselvers, inheriting from our
BaseReader
class (or theMultiFormatReader
developed in https://github.com/FAIRmat-NFDI/pynxtools/pull/250).The file extension is often not unique, i.e., many vendors have a .txt export, but all the files are actually different. But this logic could probably be handled by passing a function in the extensions dict of the
MultiFormatReader
that does this. This is already being handled similary. And finally, there should be be a check that the file comes from the list of supported vendors.