DILCISBoard / SIARD

SIARD (Software Independent Archiving of Relational Databases) - an open file format for the long-term archiving of relational databases
8 stars 0 forks source link

Interoperability issue of filepath reference for LOBs #61

Open solfeggietto opened 11 months ago

solfeggietto commented 11 months ago

DILCIS Board must take a stand on how to deal with the SIARD interoperability issue of filepath reference for LOBs.

Reposting from DBPTK Issue, because this is the single most conflicting interoperability issue among the SIARD tools, as DBPTK implements double usages of the lobFolder elements making up the total trio of full filepath of each LOB file referred to.

########################################### https://github.com/keeps/dbptk-developer/issues/476

I am posting the encountered bug in DBPTK Desktop here, as it is my undestanding that all SIARD-libarary implementations are handled by this repositpory dbptk-developer (in case i am wrong let me know and i post it elsewhere).

The only generic and good valid way to implement the LOB file path is using the unique combination of the 3 parts below:

Part 1 and 2 are not mandatory and any existing values and combinations must be handled accordingly by all SIARD-creating software as well as SIARD-reading or SIARD-validating software. Double-saving is not valid SIARD-file (using both full path in file attribute in addition to added vaues stored for lobFolder in Part 1 or 2 above!

Referring to SIARD 2.1 specification: P_4.3-3

"...The value of the file attribute is the (URL-encoded) file URI (possibly relative to the nearest lobFolder), where the BLOB is stored."

P_5.1-1 Database level metadata lobFolder

A “file:” URI representing the base URI for relative URIs indicating the possibly external storage location of large objects. If it is missing, default value of the root folder inside the ZIP file is assumed. Relative lobFolder URIs in the column metadata are relative to this value.

P_5.6-1 Column level metadata

Name of the LOB folder given as a relative or absolute “file:” URI – possi-bly in the external file system. It may be used for internal as well as ex-ternal storage of large objects.

Note This entry is only meaningful, if the column is a LOB column (e.g. type BLOB, CLOB or XML). If it is missing, it is assumed to equal “.”, e.g. to refer to the same folder as the lobFolder on the database level. Otherwise its value must be a – preferably relative – “file:” URI, indicating the folder, where the files of this LOB column are to be stored. If this value is a relative URI, it is assumed to be relative to the global lobFolder entry at the database level. The relative file attributes of the cells in this column are interpreted as being relative to this folder.

First i am referring to the closed issue 382 "Double content for LOBs" that was reported solved in DBPTK Developer v2.6.0 https://github.com/keeps/dbptk-developer/issues/382

Double Content meaning metadata.xml lobFolder element value = content Table column lobFolder element not available which was fine as it is not mandatory. Table[n].xml row file attribute path = content/schema1/table4/lob6/record2.bin Hence double content reference.

andersbonielsen commented 1 month ago

The specification for the location of LOBs according to SIARD The specification for the location of LOBs (large objects) according to the SIARD File Format specification version 2.1 (and 2.2) can appear somewhat complex, which is mostly due to its flexibility and backwards compatibility. The DILCIS Board has written a short document specifying and clarifying this issue. The specification document will later also be available at the code section of DILCISBoard/SIARD

2024.08.01 SIARD LOB location issue.pdf