bids-standard / BEP028_BIDSprov

Organizing and coordinating BIDS extension proposal 28 : BIDS Provenance
https://bids.neuroimaging.io/bep028
Creative Commons Attribution 4.0 International
4 stars 12 forks source link

globbing to describe collections of files #1

Closed remiadon closed 4 years ago

remiadon commented 4 years ago

For now we use globbing to represent collections of files Quoting the W3C-prov doc: Collections are defined as entities proving structures on top of other entities. In the context of file enumeration I found easier to use a syntax that many users are familiar with

Another aspect of entities in our framework is the "sha" field, which is used for quick equality checking between entities. A simple solution with files is simply to call a sha function on each file. In order to fill the "sha" field for a collection a files, we can simply pipe sha functions, i.e apply a sha function on the result of individual sha results.

Having many images in directory named 'fM00223', this gives

sha1sum fM00223/*.img | cut -d " " -f 1 | sha1sum | cut -d " " -f 1  # "cut" is used to trim filenames infos returned by "sha1sum"

which yields a single value for all .img files in this directory

This proposition aims to facilitate integration with existing software (eg. globbing is used in the SPM GUI to select files) as well as keeping our prov files as concise as possible

cmaumet commented 4 years ago

@remiadon +1 on this. In practice, usage will be limited as SPM now uses 4D files instead of collections of 3D files. Maybe we should update our example to make it more representative of what happens 'in the wild'?

remiadon commented 4 years ago

@cmaumet OK so i'll make sure my next example uses 4D files