ResearchObject / ro-crate-ruby

A Ruby gem for creating, manipulating and reading RO-Crates.
MIT License
1 stars 2 forks source link

Writing a crate to a new directory 'flattens' the structure - ignores folders in the resulting filesystem, and also treats them differently in the metadata #1

Closed paulwalk closed 4 years ago

paulwalk commented 4 years ago

This is probably most simply illustrated with some example code. In the following:

#!/usr/bin/env ruby
require './lib/ro_crate_ruby'

crate = ROCrate::Crate.new

Dir.glob("#{INPUT_DATA_FOLDER_PATH}/**/*").each do |path|
  if File.file?(path)
    crate.add_file(path)
  elsif File.directory?(path)
    crate.add_directory(path)
  end
end

ROCrate::Writer.new(crate).write(RO_CRATE_ROOT_FOLDER_PATH)

The result of running this script is to populate the folder at RO_CRATE_ROOT_FOLDER_PATH with a flat set of files (i.e. it does not re-create the folder structure but writes all the files into the root folder).

The ro-crate-metadata.jsonld file contains hasPart entries for the files and the folder. The hasPart references have the path for the folders, but not for the files, even if the file is from within a sub-folder, e.g. in the following, the file called MIDATA001.108.txt is from within the folder called ./sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108:

"hasPart": [
        {
          "@id": "./sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108"
        },
        {
          "@id": "MIDATA001.108.txt"
        },

A similar pattern happens with the Entities for the folders and files:

    {
      "@id": "./sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108",
      "@type": "Dataset"
    },
    {
      "@id": "MIDATA001.108.txt",
      "@type": "File"
    },

Can you advise? Thanks!

fbacall commented 4 years ago

Hi Paul, cheers for trying this out (even though I haven't written any docs yet!)

You can specify the path within the crate like this: crate.add_file(path, path: 'the/path/here') so in your code it would be:

crate.add_file(path, path: path)

I did it this way because I thought the input path could be something weird like /users/finn/work/blabla/data/1.csv. I'll make it clear in the docs that you need to set the destination path, or change it to a proper argument (with default ".").

I will probably also add a method for adding an entire directory to the crate, maintaining the relative paths of all files within.

antleaf commented 4 years ago

Thanks for getting back to me so quickly (I really should have looked more closely at the sources). That's working properly now :-)

stain commented 4 years ago

You should probably avoid "./" in the @id path (except in the Root Data Entity) - we can't assume that JSON consumers will do full URI resolution.

(Not invalid though, this is not formalized in RO-Crate 1.0 spec)

In addition, the folder paths should end with a /

@id MUST be a URI Path relative to the _RO Crate root; SHOULD end with /

So instead of ./sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108 the folder Dataset should have id sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108/ and then presumably the file sample_data/input_dataset_1/MIDATA001.108/MIDATA001.108/MIDATA001.108.txt