LibraryOfCongress / bagger

The Bagger application packages data files according to the BagIt specification.
Other
120 stars 19 forks source link

Change fields (in Json profile, display screen and Bag-info) from set to ordered list #19

Closed johnscancella closed 8 years ago

johnscancella commented 8 years ago

The current set does not allow desired ordering of metadata entry fields (in Json profile, display screen and Bag-info). It is is available here: [https://github.com/LibraryOfCongress/bagger/blob/master/bagger/src/main/java/gov/loc/repository/bagger/ui/BagInfoForm.java#L86]

Desired improvement: For an ordered field of metadata to be possible for accessions of digital records that will hep make sense of the keyed information in Bag-Info and on the display screen, it is desired that an ordered list be implemented. With such an improvement, profiles will help generate understandable metadata in the order originally intended for during project profile creation. This improvement will also help make Bagger very usable and customizable. Additionally, importing SIPs created into preservation or records systems will be facilitated by the metadata structure/order on mapping fields between Bagger (Bag-info) and the new system's metadata structure.

johnscancella commented 8 years ago

note: this was created for @houzanme1.

johnscancella commented 8 years ago

Looking at the bagit spec for metadata it does specifically state that the order should be preserved which in this case bagger does not currently comply with

acdha commented 8 years ago

@johnscancella: can this be as simple as changing https://github.com/LibraryOfCongress/bagger/blob/master/bagger/src/main/java/gov/loc/repository/bagger/ui/BagInfoForm.java#L44 to use a LinkedHashMap to display the fields in insertion order?

johnscancella commented 8 years ago

For the ordering yes, however the spec also says there may be duplicates (through in practical usage that may never happ). I was just about to test LinkedHashMap (great minds think alike a guess)

johnscancella commented 8 years ago

@houzanme1 the push I just made should solve the ordering issue for you, but still isn't to the bagit spec since it doesn't allow multiples of the same field.

If you are feeling adventurous you can try building and using the latest code until the next release is pushed out. Just follow the build process from the README

houzanme1 commented 8 years ago

@johnscancella and @acdha : Well guys, that was faster than I hoped for! Thanks to you both for acting on this. Will try and let you know if I succeed in making it work on my end.

Can't wait for the next release. BTW, will send a cleaner profile before that release if you tell me approximately when it is due out. Thanks! Tibaut

johnscancella commented 8 years ago

@houzanme1 @dbrunton or @kzwa can give a better answer as to when the next release is

houzanme1 commented 8 years ago

Does the fix assume I have GitHub for Windows to build and use the latest code with Gradle? (before the next release)

johnscancella commented 8 years ago

@houzanme1 I am not sure I understand. GitHub for Windows is just a git client and has nothing to do with the code. Git is just the source code repository format, any git compatible tool should be able to get the code from github.

Once you have the code on your computer, then use gradle to build.

Or you can download release 2.5 candidate 2 which has already been built

houzanme1 commented 8 years ago

I just meant to say, I am limited in what programs I can install and what I can do on a restricted computer. I am super glad release 2.5 candidate 2 is available!

As for the original issue, the fields order in the Json profile still does not match Bagger's display of the same fields. I did notice a new set of fields that can be added from the "standard" list.

The profile I am submitting below (prepared for our agency), is the synthesis of what appears to be important when describing and accessioning digital records. It should make accessions easier by eliminating the need to think about what field is important to add. Eliminating is easier...during accession time....:-)

Though many fields are suggested, only five (5) are required to properly identify the record and transfer/accession.

This profile can be adapted/adopted as users wish, though I have not been able to add comment lines and contact information to follow up with either change requests or improvements (Bagger does not recognize the profile any more when // or /.../ is added)...

Here is the profile named IARA(Indiana)-Accession-Profile-profile.json (See attached file saved in .txt)

johnscancella commented 8 years ago

after looking at the code some more it looks like the problem is actually upstream in org.json.JSONObject line 157 where it is using a plain hashmap (does not preserve order). This JSONObject is used to convert the json text file into a java object.

I filed a pull request with the code owner

johnscancella commented 8 years ago

It looks like preserving order in the JSON object has been discussed before and ultimately denied.

You may be able to get it to work if you change the JSON profile, but having not tested it I am not sure if it will work. See http://stackoverflow.com/a/4515863 for more details on how you could do this.

I think the real solution would be to rework/re-architect the profiles, but that would be a big change and would need more input from the general community that uses bagger.

houzanme1 commented 8 years ago

If you get that to work, you would have solved the rood cause...and that would be lovely! Tibaut

Best Regards,

Tibaut Houzanme

Mobile: +1-317-332-3296 email: houzanme@gmail.com

On Tue, Feb 16, 2016 at 8:09 PM, John Scancella notifications@github.com wrote:

after looking at the code some more it looks like the problem is actually upstream in org.json.JSONObject line 157 https://github.com/stleary/JSON-java/blob/master/JSONObject.java#L157 where it is using a plain hashmap (does not preserve order). This JSONObject is used to convert the json text file into a java object. I will file a ticket with him and see if I can get him to update the code.

— Reply to this email directly or view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/19#issuecomment-184958966 .

houzanme1 commented 8 years ago

That's not nice of them :-(

Anyway, there are limitations and I understand that. A workaround that delivers the expected outcome is all we crave:-)

And contributions from the community is welcome, though scarce. For now, I will be content with need something to get started with.

So, I am going to try some of the solutions you shared.

Any particular heads-up before I dive into it?

Thanks, Tibaut

Best Regards,

Tibaut Houzanme

Mobile: +1-317-332-3296 email: houzanme@gmail.com

On Wed, Feb 17, 2016 at 8:18 AM, John Scancella notifications@github.com wrote:

It looks like preserving order in the JSON object has been discussed before https://github.com/stleary/JSON-java/pull/190 and ultimately denied.

You may be able to get it to work if you change the JSON profile, but having not tested it I am not sure if it will work. See http://stackoverflow.com/a/4515863 for more details on how you could do this.

I think the real solution would be to rework/re-architect the profiles, but that would be a big change and would need more input from the general community that uses bagger.

— Reply to this email directly or view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/19#issuecomment-185201286 .

johnscancella commented 8 years ago

Yeah I was a little bummed about that, but I also understand that the spec states it is unordered so it is really an oversight by the original bagger creator.

As for a heads up, if you have thoughts or problems feel free to post them here (after trying to work them out of course). If you change the json profiles to be an array named something, then you will probably have to update the jsonBagger.java code. Any pull requests are appreciated.

Otherwise I will keep this open so others know.

houzanme1 commented 8 years ago

@johnscancella Sounds good. Will keep you posted. Trying different profiles, sounds easier, and updating the java code might take a bit longer. Thanks for leaving this open.

houzanme1 commented 8 years ago

I tried various profiles suggested, without editing the jsonBagger.java code. No success there. Let me know if after what happens after you altered the code. BTW, the drop-down list created by "valueList" in the profile is perfectly ordered as entered. I am impressed with it.

Best Regards,

Tibaut Houzanme

Mobile: +1-317-332-3296 email: houzanme@gmail.com

On Wed, Feb 17, 2016 at 11:08 AM, John Scancella notifications@github.com wrote:

Yeah I was a little bummed about that, but I also understand that the spec states it is unordered so it is really an oversight by the original bagger creator.

As for a heads up, if you have thoughts or problems feel free to post them here (after trying to work them out of course). If you change the json profiles to be an array named something, then you will probably have to update the jsonBagger.java code. Any pull requests are appreciated.

Otherwise I will keep this open so others know.

— Reply to this email directly or view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/19#issuecomment-185275092 .

acdha commented 8 years ago

@johnscancella What do you thinking about using javax.json.stream and have it populate an ordered hash map as each event is processed?

johnscancella commented 8 years ago

@houzanme1 I implemented a compromise, if you change the profile to use an array which does enforce order. I changed yours to use an array and included it with the rest of the code. This will be in version 2.5-RC4

johnscancella commented 8 years ago

Ohh and in case you were still wondering about comments in the profile, by definition they are not allowed. See http://stackoverflow.com/a/4183018

houzanme1 commented 8 years ago

That sounds great, @johnscancella ! My curiosity and excitement levels just went up. I can't wait ti see 2.5-RC-4!

Thanks for the heads up on comments. I thought explaining the rationale for the fields would help get good feedback. No worries then.

Let me know when RC4 comes out and I will be first to test it!

johnscancella commented 8 years ago

@houzanme1 test away https://github.com/LibraryOfCongress/bagger/releases/tag/v2.5-RC4

houzanme1 commented 8 years ago

I tried the following short codes (arrays). None made the profiles appear for selection. It would be nice if you could confirm if any of these work for you. But, if you are able to create a very short one that works, share and I will model the larger profile on it.

Thanks Tibaut

Test 1

{

"items": [ { "w1":"que", "w2":"tre", "w3":"qie" }, ]

}

Test2

{

"items": [

        {"Start": {"data": "que"}},
        {"Middle": {"data": "tre"}},
        {"End":" {"data": {"qie"}}

]

}

Test3

{

"items": [

        {"Start": {"1": "que"}},
        {"Middle": {"2": "tre"}},
        {"End":" {"3": {"qie"}}

]

}

Test 4

{

"items": [ { "w1":"que", "w2":"tre", "w3":"qie" }, ],

"itemOrder": ["w1","w2",w3"]

}
johnscancella commented 8 years ago

You have to use the key ordered. There should be an example included with the new bagger called ordered-other-project. I also included your profile and edited it to already have the ordering.

I will update the README so that it is more clear

houzanme1 commented 8 years ago

I was able to load the new profile you ordered and it looks and works as pristine as ever! Very nice and satisfying work indeed!

As for the example you mentioned, I have not seen it when I launched RC4. Plus I am improving ours and requested comments from colleagues. Hopefully I get a real clean and beautified profile back for inclusion in future releases.

I can't say it well enough, but you have made an outstanding improvement to Bagger!

Tibaut