Closed sbarbosadataverse closed 5 years ago
Committed changes, passing to Phil
Give back to Sonia to check to check in dvn-build
@sbarbosadataverse as we discussed I did my usual check to make sure the tsv change didn't require a change to the Solr schema.xml and it didn't.
We sat and previewed the change on my laptop but we're going to wait to merge the 2310-GSD branch (currently just commit df60357 ) into 4.0.2 until you've heard back from Janina that the information you copied from the UI reflects the intended change. Passing this back to you until you've heard back.
@sbarbosadataverse after you've confirmed with Janina that the change is ok, please pass this issue to @sekmiller who will write up instructions for @kcondon for what to do with the updated tsv file.
Janina has additional changes to make, as suspected.
On 7/6/2015 1:14 PM, Philip Durbin wrote:
@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=0UJ4ZewFjCsNgijhvOuNzR84LDV_QhcYTpqI4rR17U4&e= after you've confirmed with Janina that the change is ok, please pass this issue to @sekmiller https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=M_eU_kFHjYxWBn0Z7CdoWS6VfU17wFwKgxFTx3nI1BM&e= who will write up instructions for @kcondon https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kcondon&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=iPrwMu6Xq6uDe8P942kkFfgYtujIIZCVbKJKHCZAaJA&e= for what to do with the updated tsv file.
— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D118928058&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=vtAgvemvCYkvNVf-OhWrR5pKZOPqaVqyuJ895J0AOO0&e=.
Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University
Check-out Dataverse 4.0 Demo! http://dataverse-demo.iq.harvard.edu/
Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community
@sbarbosadataverse ok, please feel free to update the branch we started: https://github.com/IQSS/dataverse/tree/2310-GSD
Ok will do
On 7/8/2015 11:41 AM, Philip Durbin wrote:
@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=QcyI7J4BxZbhVMWP0xgxLwhM9xrBe5u_UAXlZnYYoo0&e= ok, please feel free to update the branch we started: https://github.com/IQSS/dataverse/tree/2310-GSD https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_tree_2310-2DGSD&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=SpTDNX7EVrZMTft7Fk0CaqnzvSUTGQDu3vloSE8C-ow&e=
— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D119629811&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=LEVDWiwwKAHpdL1rlS2Unvk8CJURLEB2lietUcMmP-g&e=.
Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University
Check-out Dataverse 4.0 Demo! http://dataverse-demo.iq.harvard.edu/
Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community
@scolapasta I know we missed today's deadline for 4.1 but if we can get this in before the next milestone deadline of August that would be great--they use this for their next class uploads starting soon
@sbarbosadataverse just a heads up that we'll need to do the same thing as last time. I'll pull in your latest change to the GSD metadata block and I'll have you look at it to see if it's what you want before we merge it into 4.2. Let's coordinate a time to do this.
Sounds good. Thanks On Sep 17, 2015 5:07 PM, "Philip Durbin" notifications@github.com wrote:
@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=JJG1x7YFSeOgH-A2BR1PaaWbZmaR4vcjux7A7e7t3gk&e= just a heads up that we'll need to do the same thing as last time. I'll pull in your latest change to the GSD metadata block and I'll have you look at it to see if it's what you want before we merge it into 4.2. Let's coordinate a time to do this.
— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D141228651&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=pLuy2TLkryfV2nsdlOvaSUGJ1mMq_ecnbPfs29H9o0k&e= .
On https://shibtest.dataverse.org loaded customGSD.tsv from 4.1. Then I try to re-load the version from https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv but I got this error:
[root@dvn-vm3 api]# curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/customGSD.tsv -H "Content-type: text/tab-separated-values" {"status":"ERROR","message":"3"}
Here's the stack trace (I build the 4.2 war file on my laptop from commit 9ae1a64):
[2015-09-18T09:21:21.098-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.api.DatasetFieldServiceApi] [tid: _ThreadID=30 _ThreadName=http-listener-1(5)] [timeMillis: 1442582481098] [levelValue: 900] [[
Error parsing dataset fields:3
java.lang.ArrayIndexOutOfBoundsException: 3
at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.parseControlledVocabulary(DatasetFieldServiceApi.java:370)
at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.loadDatasetFields(DatasetFieldServiceApi.java:263)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:344)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:205)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFilter.java:161)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:873)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:739)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:575)
at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:546)
at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:428)
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:378)
at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:316)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:412)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:282)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:459)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:167)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:201)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:175)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:561)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
at java.lang.Thread.run(Thread.java:745)
]]
Line 370 is cvv.setIdentifier(values[3]);
. Here: https://github.com/IQSS/dataverse/blob/9ae1a6432592499b8ff7304ae76301d08f03bac6/src/main/java/edu/harvard/iq/dataverse/api/DatasetFieldServiceApi.java#L370
As I mentioned to @scolapasta yesterday, the change in 76496aa (the version of the tsv I'm trying to load) seems to affect basically the entire tsv file. It's a much bigger change than the earlier commit at df60357 which seem to only change controlled vocabulary values.
In short, I think there's something wrong with the latest version of the tsv file. The code that parses this tsv file is picky and I don't know much about it. @scolapasta was the original author and @sekmiller added the feature to re-load an updated tsv file (I'm not sure which issue that was).
@posixeleni knows a lot about these tsv files as well. Again, I'm pretty sure we need a new one. One that doesn't cause the code to throw exceptions.
Line 370 is cvv.setIdentifier(values[3])
I spoke with @sekmiller and he indicated that the concept of an identifier for a controlled vocabulary value was added after 4.0. Judging from #947 it was added in 4.0.1 by @scolapasta and @posixeleni .
We were speculating that perhaps the problem is that the "identifier" column was empty but after uploading the version from 4.1 to Google Docs, I don't think that's the problem because the "identifier" column was empty back in 4.1 too:
That screenshot comes from here (4.1 version of the GSD block): https://docs.google.com/spreadsheets/d/1xQ8wi1-2NqylgzROf72A64ojrpQJAJTHdFe3mRzPHN0/edit?usp=sharing
In addition the "journals" metadata block in 4.1 didn't have "identifier" filled in:
So I'm pretty sure "identifier" is optional.
I spoke with @scolapasta and he indicated I should give this issue to @posixeleni to review the tsv file that is failing to import.
@posixeleni here's the file: https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv
If there's something obvious you can fix, please feel free to push to the branch we've been using: https://github.com/IQSS/dataverse/commits/2310-GSD
If it helps, I think the line with "01321" may be a problem. It's different than the surrounding lines:
murphy:dataverse pdurbin$ grep 01321 customGSD.tsv -C3
gsdCourseName 01317 39
gsdCourseName 01318 40
gsdCourseName 01319 41
01321
gsdCourseName 01401 42
gsdCourseName 01401 43
gsdCourseName 01402 44
This is just a theory though...
@pdurbin @sbarbosadataverse there's a bigger issue here than just the error coming out of this tsv. I can clean that up easily but it appears that the GSD wants us to replace their Course Names with new course names which after speaking with @sekmiller our code is not currently able to do yet so we at the moment we can only add new values to the tsv's controlled vocabulary.
Waiting to get an ETA from @sekmiller on when we can replace controlled vocabulary values and not just add new values.
@sekmiller as we discussed, you're welcome to look at adding a "preview" mode while you're in that part of the code: #2551
I'm going to close this very old issue as I think it pertains to a custom metadata block in dataverse.harvard.edu. I contemplated bringing this into a larger metadata consolidation issue (#6030) but I'm not sure what it's about (versioning metadata blocks, maybe?).
I think this started as an update to a custom metadata block on Harvard Dataverse where the people using the metadata block wanted to edit the terms in one of its controlled vocabularies (instead of or in addition to adding terms). Maybe because the names of faculty changed or had typos. But the process of updating the metadata blocks doesn't handle editing terms in the controlled vocabularies. So I'm guessing that when this edited GSD metadata block was being uploaded, Dataverse saw the new controlled vocabulary terms in the tsv file and said, "Hey! There are saved datasets that have terms in the Faculty Name field that aren't in this new tsv file. You can't do that."
Is that right? If so:
Janina changed custom field names