Closed aazaff closed 5 years ago
XML files are no longer required at top level; only the azgs.json file is required. It should be in the root directory.
The rest of the directory structure should be the same as documented I haven't compared to the README for a while, but I don't think anything has changed from what is described there, other than the azgs.json file.
Since these are new collections, the collection_id will be assigned when they are uploaded.
To clarify, that means that I can name the master directory anything – my_data_collection_directory, and the new add script will overwrite this directory name with the new collection_id?
Sweet!
Yes, it should work this way.
Keep in mind that azgs_path is built from the archive option and and new collection_id. azgs_old_url is built from a hard-coded "http://repository.azgs.az.gov/uri_gin/azgs/dlio/" and the name of the master directory. So that might look weird.
Here is an example of a minedata collection. I programmatically generated the azgs.json file by scraping the drupal metadata. Please let me know if it works with your script!
Note, that the azgs_old_url
does not follow the the "http://repository.azgs.azg.gove/uri_gin/azgs/dlio/" format that we used with the other bulk dataset.
Close, but not quite.
Is there a reason every string value is in an array?
BTW, I miswrote earlier. The AZGS Old link is fabricated by azlibCreate. Since you are not using that, this won't be a problem here.
Ugh, I thought that might be a problem. It is something the tool I'm using to write out the JSON is doing automatically... I'm sure there's some setting or parameter I can change to fix it.
What tool is it; I can take a look.
It’s an rscript.. jsonlite. https://cran.r-project.org/web/packages/jsonlite/index.html
Don’t worry, I’m sure I can figure it out quickly.
auto_unbox?
Hah! You beat me to it, yup, already tested it and it fixed it.
Great! Show me the zip when you can and I'll try it out
It's blowing up on "AZGS Miscellaneous Minedata Collection" because that is new. I'll add this and try again.
But this brings up an interesting question: Do we want to create a new collection_group on the fly when a new string is encountered?
Good question. My instinct is to say NO.
New collection_groups should be so rare that they can be added to the db manually, or at least separately from the main collections upload.
FYI, all of the collections in this new minedata set will have the same collection_group.
I forget... did those two sample ones work correctly once you added the new collection_group?
Yes
On Fri, Feb 22, 2019 at 2:44 PM Andrew Zaffos notifications@github.com wrote:
I forget... did those two sample ones work correctly once you added the new collection_group?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/azgs/azlibrary_database/issues/10#issuecomment-466558710, or mute the thread https://github.com/notifications/unsubscribe-auth/AL925DzyilblgYO_5mT2ZjP5_3nHPISnks5vQGTQgaJpZM4bDtDM .
I have a new bulk dataset of 18,222 files to add to azlibrary that I have scraped from https://minedata.azgs.arizona.edu/.
These do not come with ISO 19139 XML files, instead I am building new azgs.json metadata files as I scrape.
A few things that I need to work out before I can hand these off to @NoisyFlowers
Everything else seems relatively straightforward to me.