ELVIS-Project / elvis-database

:musical_score: ELVIS Database Web Application
15 stars 5 forks source link

Manifests in zipped files. #176

Closed musicus closed 8 years ago

musicus commented 8 years ago

When a user downloads a collection of pieces, the compositions are organized in a file structure. We need to include a manifest in the top level directory that provides the metadata to the compositions (and provides a layout to the file structure).

agpar commented 8 years ago

You have two options on the dev branch right now.

Download a flat zip, where the metadata of every piece is dumped into a single meta file:

meta
file_1
...
file_n

Download a hierarchical zip, where every directory with files contains metadata about those files.

composer_1/
    piece_1/
        meta
        file_1
        ...
        file_n
        movement_1/
            meta
            file_1
            ...
            file_n
    ...
    piece_n
....
composer_n

Are you suggesting the second mode needs a manifest at the root dir describing all the available files? Why not download the flat zip, in that case? What does this manifest actually contain, and what does it get us (beyond supporting the idea that we might allow bulk importing someday)?

musicus commented 8 years ago

I'm ok with the flat zip. It makes it a lot easier to grab all the files and add them to the music21 (humdrum, vis) stream, without having to write a recursive folder traversal, before a series of pieces are added to a stream. We keep all the metadata in one file. Provided that no files are overwritten.

With the second option (also, only one metadata file, maintaining folder structure elegance) it would look like this:

zip/
  metadata
  composer_1/
    piece_1/
      movement_1/
        file_1
        ...
  composer_n/
  ...

You decide, but we do need the metadata of all pieces included in the zip, in one file.

AFFogarty commented 8 years ago

This structure would also need to handle cases where the file is attached directly to the piece, not the movement:

zip/
    meta
    composer/
        piece/
            file_1
            movement/
                file_2
                ...
agpar commented 8 years ago

Andrew, if I understand what you're saying, it already does.

Reiner, why do you think that change would be superior to how it's set up right now?

AFFogarty commented 8 years ago

One advantage is that you don't have to search the directory tree to get all the metadata.

musicus commented 8 years ago

Simplicity. All metadata in one place.

agpar commented 8 years ago

We have two modes so that the cart can be easy to browse (for a human) and easy to import into analysis software, depending on the user's intentions.

The flattened mode accomplishes the latter. Dumping all metadata and files into a single directory makes it easy to point some software at the dir and click 'go'.

The hierarchical mode attempts to accomplish the former. A hierarchy is easier for a human to browse. It is easier to look at the metadata of a file that you are interested in when the metadata is in the same directory as the file (the alternative is moving up 2-3 directories and running a search inside a large file for the metadata you are interested in).

Accumulating the metadata in the root directory of the hierarchical zip fundamental confuses what the purpose of the hierarchical mode actually is.

However, there is a clear compromise, which is to just do both. I'm surprised neither of you have suggested that yet.

AFFogarty commented 8 years ago

So are there going to be two options in the GUI?

musicus commented 8 years ago

All metadata in one place is the simplest. But if it takes placing individual meta data files into each directory, so we can have a singular meta data file in the bottom directory, then make it so.

ahankinson commented 8 years ago

You should support the BagIt format. Rodan does already.

https://github.com/ahankinson/pybagit https://en.wikipedia.org/wiki/BagIt http://www.digitalpreservation.gov/documents/bagitspec.pdf

musicus commented 8 years ago

:+1:

AFFogarty commented 8 years ago

Should be break BagIt support out into a separate Issue?

agpar commented 8 years ago

Yes.

Right now all I'm doing is dumping a copy of the json you would get if you requested a piece through django into a file. Supporting the Bagit format would be a large and complex feature of its own.

(way less complex if pybagit is easy to run on python3)

musicus commented 8 years ago

Yes. Not really that complex...

AFFogarty commented 8 years ago

So, what's left to do for this issue?

musicus commented 8 years ago

meta data top directory, meta data with files (without BagIt for now), unless this already happened...

zip/
    meta (everything)
    composer/
        meta (composer)
        piece/
            meta (composer/piece)
            file_1
            movement/
                meta (composer/piece/movement)
                file_2
                ...```

We still gotta have the meta data, no matter what.
AFFogarty commented 8 years ago

Ok, in 1a6c1e8738350e05efba79bed1e62a431a32274d I committed some HTML that prompts the user for the type of directory structure that they want. @lexpar, it should be easy for you to hook this up to your control on the back-end.