Closed ranicrab closed 4 years ago
@ranicrab it looks like the first pid was added to the 'object store', but was not added to the Solr search index:
https://data.piscoweb.org/catalog/d1/mn/v2/query/solr/?q=id:%22resourceMap_marine_cbs.3.1%22
So there must have been a Solr indexing problem when this pid was initially added. You can tell metacat to 'reindex' the pid, but this has to be done by someone with admin priviledges.
How did you upload the second pid? How did you upload the resource map associated with the second pid? The uploadDataPackage
routine takes care of creating a resource map and associating it with the items in the data package for you.
@ranicrab BTW - have you considered using the MetacatUI editor for creating and updating packages? You can sign in as you have been or at https://data.piscoweb.org/metacatui. Next you search for a dataset and click 'EDIT' to begin editing the package. If you have a new datapackage to upload, then you can click on the user icon on the upper right and select 'Submit data'.
Hi Peter,
I have talked to Chris about using MetacatUI (for opc.dataone.org), but we have some things in our metadata that aren't included in the UI interface (method steps, keyword sets, additional intellectual rights statements, etc.). So, for people creating from scratch this is great but for our existing data sets (or new datasets that should include a lot of the same metadata) this doesn't work.
Regarding the loads we have been trying, I have some (a lot) of additional information for you (Mike Frenock and I tested some things out today).
FROM OSU (using java): 1) I loaded the following (EML 2.2.0) data: CBLX00_XXXITV2XLSR02_20190321.40.1 metadata: CBLX00_XXXITV2XLSR02_20190321.50.1 resource map: resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
The following was replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 https://cn.dataone.org/cn/v1/object/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 https://cn.dataone.org/cn/v1/object/resourceMap_CBLX00_XXXITV2XLSR02_20190321.40.1
However, I can not download the package in R:
packageId <- "resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1" pkg <- getDataPackage(d1c, identifier=packageId, lazyLoad=TRUE, limit="0MB", quiet=FALSE) Trying urn:node:PISCO Error in .local(x, identifier, ...) : Identifier resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 not found on node urn:node:CNUCSB1 or urn:node:PISCO
In addition, the following was NOT replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.50.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.50.1
2) I also loaded the following (EML 2.2.0) data: marine_cbs.4.1 metadata: marine_cbs.3.1 resource map: resourceMap_marine_cbs.3.1
The following was replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/marine_cbs.4.1 https://cn.dataone.org/cn/v1/meta/resourceMap_marine_cbs.3.1 https://cn.dataone.org/cn/v1/object/resourceMap_marine_cbs.3.1 https://cn.dataone.org/cn/v1/meta/resourceMap_marine_cbs.3.1
However, I can not download the package in R:
packageId <- "resourceMap_marine_cbs.3.1" pkg <- getDataPackage(d1c, identifier=packageId, lazyLoad=TRUE, limit="0MB", quiet=FALSE) Trying urn:node:PISCO Error in .local(x, identifier, ...) : Identifier resourceMap_marine_cbs.3.1 not found on node urn:node:CNUCSB1 or urn:node:PISCO
In addition, the following was NOT replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/marine_cbs.3.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/ https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1marine_cbs.4.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/marine_cbs.4.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/ https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1marine_cbs.3.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/marine_cbs.4.1
FROM R: 1) I loaded the following (EML 2.0.1) data: KHLX00_XXXITV2XLSR01_20190220.40.1 metadata: KHLX00_XXXITV2XLSR01_20190220.50.1 resource map: resource_map_doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1
The following was replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.40.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.50.1 https://cn.dataone.org/cn/v1/meta/resource_map_doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1 https://cn.dataone.org/cn/v1/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 https://cn.dataone.org/cn/v1/object/resource_map_doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1 https://cn.dataone.org/cn/v1/object/resourceMap_CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.50.1
I had thought the resource map wasn't created because I was trying
resourceMap_KHLX00_XXXITV2XLSR01_20190220.50.1
instead of
resource_map_doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.50.1
With the correct resource map ID I am able to download the package in R.
However, the following was NOT replicated successfully in the CN: https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/KHLX00_XXXITV2XLSR01_20190220.40.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1
2) I also loaded the following (EML 2.2.0) metadata: marine_cbs.3.2
The following was NOT replicated successfully in the CN: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/marine_cbs.3.2 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/marine_cbs.3.2 https://cn.dataone.org/cn/v1/object/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.40.1 https://cn.dataone.org/cn/v1/meta/resourceMap_marine_cbs.3.2 https://cn.dataone.org/cn/v1/object/resourceMap_marine_cbs.3.2
I even tried this, since the OSU load vs. the R load creates different resource map ids: https://cn.dataone.org/cn/v1/object/resource_map_doi:10.6085/AA/marine_cbs.3.2
So it seems like EML 2.2.0 metadata and associated data objects are not being replicated regardless of how they are loaded. Using R, even the EML 2.0.1 data objects are not replicating.
I am going to try to fix the marine_cbs.3.X package since I think there were other issues affecting it and I didn't download the resource map before updating in R. In the meantime I will try to load a separate/new dataset in EML 2.2.0 using R to see if I have similar issues.
Let me know if there is any other information you need or if there is anything else I should try. Thanks!
On 2020-02-13 1:21 PM, Peter Slaughter wrote:
@ranicrab https://github.com/ranicrab BTW - have you considered using the MetacatUI editor for creating and updating packages? You can sign in as you have been or at https://data.piscoweb.org/metacatui. Next you search for a dataset and click 'EDIT' to begin editing the package. If you have a new datapackage to upload, then you can click on the user icon on the upper right and select 'Submit data'.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/rdataone/issues/246?email_source=notifications&email_token=AOOR4G2RU7OIBX6GZZBIX7TRCW2ULA5CNFSM4KUJMRSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELWU4CY#issuecomment-585977355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOOR4G23WELAGP3MLLXWPBLRCW2ULANCNFSM4KUJMRSA.
-- Rani Gaddam Research Associate and Data Manager MARINe Research Group at UC Santa Cruz Long Marine Lab 115 McAllister Way, Santa Cruz,Ca 95060 (831) 459-1621 office (831) 459-3383 fax gaddam@ucsc.edu http://pacificrockyintertidal.org http://intertidalmap.org
@ranicrab I checked a couple of the resource map pids that you mentioned, and for some reason they are not being added to the Solr search index. The DataONE R client uses the Solr index to locate resource maps, so that is why it can't find them. There must be an error occurring after a resource map is uploaded to metacat and when it would be inserted into the index. So, as you mentioned, you can see that a resource map has been uploaded:
https://data.piscoweb.org/catalog/d1/mn/v2/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
But it is not in the search index:
https://data.piscoweb.org/catalog/d1/mn/v2/query/solr/?q=id:resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
The main method for finding a problem with the indexing step is to look in the tomcat log for metacat. The index step can be manually initiated by using the legacy metacat api, after logging in as the metacat admin. The command would be something like this:
https://data.piscoweb.org/catalog/metacat?action=reindex&pid=resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
The log file can be inspected right after this manual index step to see if any errors occur.
Hi Peter,
I logged in to https://data.piscoweb.org/catalog/admin and tried to reindex but got an error message which I have also sent to Jing:
message Service problem while intializing MetaCat Servlet: ServiceService.registerService - Service: DatabaseService is already registered. Use ServiceService.reregister() to replace the service.
However I do think there is an EML 2.2.0 issue going on (regardless of indexing on PISCO).
These metadata files and resource maps show up in CN (this one is EML 2.0.1): https://cn.dataone.org/cn/v1/meta/resourceMap_CBLX00_XXXITV2XLSR01_20190321.50.1 https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR01_20190321.50.1
However only the resource map shows up for this one (this one is EML 2.2.0): https://cn.dataone.org/cn/v1/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
The metadata file does not show up: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR02_20190321.50.1
The solr query on CN only shows the resource map for the EML 2.0.1 package: http://cn.dataone.org/cn/v1/query/solr/?q=id:*resourceMap*+id:*CBLX00*+id:*2019*
So the resource map for both of them ended up making it to CN but only one is searchable, and the only difference is the EML version (they are the same types of files).
The previous file I loaded with R (EML 2.0.1) has a resource map here: https://cn.dataone.org/cn/v1/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
and it shows up on the search here: http://cn.dataone.org/cn/v1/query/solr/?q=id:*resource_map*+id:*KHLX*+id:*2019*
I also loaded an EML 2.2.0 file using R. It loaded but the resource map is named with a generated id vs. using the metadata file name:
and when I try to search for it in solr I get an error message:
Similar to the other 2.2.0 files, this shows up: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR03_20190321.40.1
but this does not: https://cn.dataone.org/cn/v1/meta/doi:10.6085/AA/CBLX00_XXXITV2XLSR03_20190321.50.1
I did run all of the metadata files through the EML Parser and they passed, but let me know if you want the actual files to look at for the 2.2.0 ones.
OK - lots to look at - hope we can get this figured out! I also want to have a way to set the id for the resource map since that is just strange.
Thanks!
-Rani
On 2020-02-13 4:22 PM, Peter Slaughter wrote:
@ranicrab https://github.com/ranicrab I check a couple of the resource map pids that you mentioned, and for some reason they are not being added to the Solr search index. The DataONE R client uses the Solr index to locate resource maps, so that is why it can't find them. There must be an error occurring after a resource map is uploaded to metacat and when it would be inserted into the index. So, as you mentioned, you can see that a resource map has been uploaded:
https://data.piscoweb.org/catalog/d1/mn/v2/meta/resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 But it is not in the search index:
https://data.piscoweb.org/catalog/d1/mn/v2/query/solr/?q=id:resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 The main method for finding a problem with the indexing step is to look in the tomcat log for metacat. The index step can be manually initiated by using the legacy metacat api, after logging in as the metacat admin. The command would be something like this:
https://data.piscoweb.org/catalog/metacat?action=reindex&pid=resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1 The log file can be inspected right after this manual index step to see if any errors occur.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/rdataone/issues/246?email_source=notifications&email_token=AOOR4GZF6HRDQ6BUAEZJGJTRCXP4LA5CNFSM4KUJMRSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELXDRBA#issuecomment-586037380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOOR4G4FFGW76KPJXAW6NITRCXP4LANCNFSM4KUJMRSA.
-- Rani Gaddam Research Associate and Data Manager MARINe Research Group at UC Santa Cruz Long Marine Lab 115 McAllister Way, Santa Cruz,Ca 95060 (831) 459-1621 office (831) 459-3383 fax gaddam@ucsc.edu http://pacificrockyintertidal.org http://intertidalmap.org
Hi - I checked all of the links this morning to see if there were any changes but everything is still the same - let me know if I can try/test anything else - in the meantime I may just work on my data package updates using the previous version of EML. Thanks!
@ranicrab I'll ask @csjx about the EML issue, as I'm not familiar with this issue yet. Also, it would be helpful if Mike could look into the indexing issue that I mentioned, as that relates to the resource maps not being indexed and the R client not being able to find the packages.
We did try to look into indexing but got strange errors - hopefully we will hear back from Jing soon - thanks!!
On 2020-02-18 9:55 AM, Peter Slaughter wrote:
@ranicrab https://github.com/ranicrab I'll ask @csjx https://github.com/csjx about the EML issue, as I'm not familiar with this issue yet. Also, it would be helpful if Mike could look into the indexing issue that I mentioned, as that relates to the resource maps not being indexed and the R client not being able to find the packages.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/rdataone/issues/246?email_source=notifications&email_token=AOOR4G6ZAFOKVADNH2LUYTLRDQOJRA5CNFSM4KUJMRSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMC66WI#issuecomment-587591513, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOOR4GZW65J7TAZVAD2KW3TRDQOJRANCNFSM4KUJMRSA.
-- Rani Gaddam Research Associate and Data Manager MARINe Research Group at UC Santa Cruz Long Marine Lab 115 McAllister Way, Santa Cruz,Ca 95060 (831) 459-1621 office (831) 459-3383 fax gaddam@ucsc.edu http://pacificrockyintertidal.org http://intertidalmap.org
Hi again - this indexing query is showing up now: https://data.piscoweb.org/catalog/d1/mn/v2/query/solr/?q=id:resourceMap_CBLX00_XXXITV2XLSR02_20190321.50.1
I think that we just tried too soon before (things were being re-started a bunch and now have settled out..)
@ranicrab I'm closing this issue as I believe everything you raised here has been addressed.
Hi - I can view a resource map for one of my packages here: https://data.piscoweb.org/catalog/d1/mn/v1/meta/resourceMap_marine_cbs.3.1
but when I tried to find it using R I get this message:
So instead I tried to upload a revised version of the metadata (doi:10.6085/AA/marine_cbs.3.1 to doi:10.6085/AA/marine_cbs.3.2). That loaded successfully but I cannot find the resource map so I likely shouldn't have done it that way. Please let me know the best way to proceed. Thanks!