biigle / core

:large_blue_circle: Application core of BIIGLE
https://biigle.de
GNU General Public License v3.0
12 stars 16 forks source link

File metadata parsers #555

Open mzur opened 1 year ago

mzur commented 1 year ago

We had a request if BIIGLE could support more file metadata formats. These formats would require more than a simple mapping between column names as it is currently done. Some formats may also be very specific and only required for a single BIIGLE instance. Here is how this could be achieved:

Implement a generic "metadata parser" interface that receives a string as input (i.e. the file content, if a file is uploaded it will be automatically read before the parser is called) and produces the internal BIIGLE metadata array as output. A parser may also throw an exception if there is anything wrong with the file. The first metadata parser classes could be the CSVParser that basically does this and the IfdoV1Parser that basically does this. I'm told that v2.0 of the iFDO standard will be finalized soon, so we could also add an IfdoV2Parser in the future (@tschoeni).

All metadata parsers are defined in a config file like this:

[
   'csv' => [
      'name' => 'Metadata CSV',
      'parser' => CsvParser::class,
   ],
   'ifdov1' => [
      'name' => 'iFDOv1 YAML',
      'parser' => IfdoV1Parser::class,
   ],
   'ifdov2' => [
      'name' => 'iFDOv2 YAML',
      'parser' => IfdoV2Parser::class,
   ],
   'myformat' => [
      'name' => 'My Format XLSX',
      'parser' => MyFormatParser::class,
   ],
]

All parsers are offered in the "import" dropdown in the "create new volume" form of BIIGLE.

If there is a custom format that should be supported in a single BIIGLE instance only, the config array above could be extended (see myformat) and a new parser class file injected at build-time.

The parsers (except maybe the native CSV parser) could all work like the iFDO import is working currently. When an import file is selected in the new volume form, the file is uploaded to an API endpoint that returns the parsed information in the BIIGLE CSV format. This information is then added to the respective form field. Another special case is the support for an iFDO file upload (in addition to storing the metadata in the database). This is probably not required for other metadata imports.

WaiiMCap commented 1 year ago

@mzur Thank you for providing that explanation, Could you kindly explain the process of injecting a class file at build-time, please?

mzur commented 1 year ago

The build happens in the build directory of the distribution configuration. You can add the class file to this directory and then "inject" it in the base Docker image (build.dockerfile) when it is built. Here is an example how the filesystems config is copied to the image (to replace the default config). You can add any PHP file this way (as long as it is placed at the correct location).

WaiiMCap commented 1 year ago

@mzur Could you please tell me where to locate the config file that defines the available parsers and their associated parser classes in BIIGLE?

mzur commented 1 year ago

This feature does not exist in BIIGLE yet. This issue describes the feature idea how new and more flexible parsers could be implemented. Currently, BIIGLE can parse its CSV file format and iFDO files. You can find all the relevant information about these in the first post of this issue.

mzur commented 1 year ago

@WaiiMCap are you working on this issue or is it free for someone else to pick up?

WaiiMCap commented 1 year ago

@WaiiMCap are you working on this issue or is it free for someone else to pick up?

I'm working on it.

mzur commented 9 months ago

@WaiiMCap Any news on this? We'll start working on iFDOv2 support soon and it would integrate nicely with this.

mzur commented 6 months ago

fyi, a general framework for new metadata parsers is now taking shape here https://github.com/biigle/core/pull/709

New metadata parsers should extend the new abstract class MetadataParser. An example for adding a custom metadata parser (iFDOv2) as a PHP package will also be developed.