IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
881 stars 492 forks source link

Update GSD custom fields #2310

Closed sbarbosadataverse closed 5 years ago

sbarbosadataverse commented 9 years ago

Janina changed custom field names

sbarbosadataverse commented 9 years ago

Committed changes, passing to Phil

sbarbosadataverse commented 9 years ago

Give back to Sonia to check to check in dvn-build

pdurbin commented 9 years ago

@sbarbosadataverse as we discussed I did my usual check to make sure the tsv change didn't require a change to the Solr schema.xml and it didn't.

We sat and previewed the change on my laptop but we're going to wait to merge the 2310-GSD branch (currently just commit df60357 ) into 4.0.2 until you've heard back from Janina that the information you copied from the UI reflects the intended change. Passing this back to you until you've heard back.

pdurbin commented 9 years ago

@sbarbosadataverse after you've confirmed with Janina that the change is ok, please pass this issue to @sekmiller who will write up instructions for @kcondon for what to do with the updated tsv file.

sbarbosadataverse commented 9 years ago

Janina has additional changes to make, as suspected.

On 7/6/2015 1:14 PM, Philip Durbin wrote:

@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=0UJ4ZewFjCsNgijhvOuNzR84LDV_QhcYTpqI4rR17U4&e= after you've confirmed with Janina that the change is ok, please pass this issue to @sekmiller https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sekmiller&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=M_eU_kFHjYxWBn0Z7CdoWS6VfU17wFwKgxFTx3nI1BM&e= who will write up instructions for @kcondon https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_kcondon&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=iPrwMu6Xq6uDe8P942kkFfgYtujIIZCVbKJKHCZAaJA&e= for what to do with the updated tsv file.

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D118928058&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=WHwMZhNKmmVyLWAmxUPaPrQUvBzIRaZpDsWUjiNAKQM&s=vtAgvemvCYkvNVf-OhWrR5pKZOPqaVqyuJ895J0AOO0&e=.

Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University

Check-out Dataverse 4.0 Demo! http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community

pdurbin commented 9 years ago

@sbarbosadataverse ok, please feel free to update the branch we started: https://github.com/IQSS/dataverse/tree/2310-GSD

sbarbosadataverse commented 9 years ago

Ok will do

On 7/8/2015 11:41 AM, Philip Durbin wrote:

@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=QcyI7J4BxZbhVMWP0xgxLwhM9xrBe5u_UAXlZnYYoo0&e= ok, please feel free to update the branch we started: https://github.com/IQSS/dataverse/tree/2310-GSD https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_tree_2310-2DGSD&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=SpTDNX7EVrZMTft7Fk0CaqnzvSUTGQDu3vloSE8C-ow&e=

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D119629811&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=beepJsQzorQFNv2CgBHQn3borAhT1xmhBXgL5XfJDIw&s=LEVDWiwwKAHpdL1rlS2Unvk8CJURLEB2lietUcMmP-g&e=.

Sonia Barbosa Manager of Data Curation, IQSS Dataverse Network Manager of the Murray Research Archive, IQSS Data Science Harvard University

Check-out Dataverse 4.0 Demo! http://dataverse-demo.iq.harvard.edu/

Join our Dataverse Community! https://groups.google.com/forum/#!forum/dataverse-community

sbarbosadataverse commented 9 years ago

@scolapasta I know we missed today's deadline for 4.1 but if we can get this in before the next milestone deadline of August that would be great--they use this for their next class uploads starting soon

pdurbin commented 9 years ago

@sbarbosadataverse just a heads up that we'll need to do the same thing as last time. I'll pull in your latest change to the GSD metadata block and I'll have you look at it to see if it's what you want before we merge it into 4.2. Let's coordinate a time to do this.

sbarbosadataverse commented 9 years ago

Sounds good. Thanks On Sep 17, 2015 5:07 PM, "Philip Durbin" notifications@github.com wrote:

@sbarbosadataverse https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_sbarbosadataverse&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=JJG1x7YFSeOgH-A2BR1PaaWbZmaR4vcjux7A7e7t3gk&e= just a heads up that we'll need to do the same thing as last time. I'll pull in your latest change to the GSD metadata block and I'll have you look at it to see if it's what you want before we merge it into 4.2. Let's coordinate a time to do this.

— Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2310-23issuecomment-2D141228651&d=BQMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=8R6PzVqt1PEocauQgZMGXsGz29-nb19M7eqlo1d8EVs&m=cWrSTkqGhkrhdgOJaJJ6qvx8EUnLJ_kh0GcwEIJ-Or8&s=pLuy2TLkryfV2nsdlOvaSUGJ1mMq_ecnbPfs29H9o0k&e= .

pdurbin commented 9 years ago

On https://shibtest.dataverse.org loaded customGSD.tsv from 4.1. Then I try to re-load the version from https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv but I got this error:

[root@dvn-vm3 api]# curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @data/metadatablocks/customGSD.tsv -H "Content-type: text/tab-separated-values" {"status":"ERROR","message":"3"}

Here's the stack trace (I build the 4.2 war file on my laptop from commit 9ae1a64):

[2015-09-18T09:21:21.098-0400] [glassfish 4.1] [WARNING] [] [edu.harvard.iq.dataverse.api.DatasetFieldServiceApi] [tid: _ThreadID=30 _ThreadName=http-listener-1(5)] [timeMillis: 1442582481098] [levelValue: 900] [[
  Error parsing dataset fields:3
java.lang.ArrayIndexOutOfBoundsException: 3
    at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.parseControlledVocabulary(DatasetFieldServiceApi.java:370)
    at edu.harvard.iq.dataverse.api.DatasetFieldServiceApi.loadDatasetFields(DatasetFieldServiceApi.java:263)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:151)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:171)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:152)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:104)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:387)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:331)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:103)
    at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:297)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:254)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1028)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:381)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:344)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221)
    at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1682)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:344)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.ocpsoft.rewrite.servlet.RewriteFilter.doFilter(RewriteFilter.java:205)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at edu.harvard.iq.dataverse.api.ApiBlockingFilter.doFilter(ApiBlockingFilter.java:161)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:30)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:873)
    at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:739)
    at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:575)
    at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:546)
    at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:428)
    at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:378)
    at edu.harvard.iq.dataverse.api.ApiRouter.doFilter(ApiRouter.java:34)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:316)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:160)
    at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
    at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
    at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:99)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:174)
    at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:734)
    at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:673)
    at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:412)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:282)
    at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:459)
    at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:167)
    at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:201)
    at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:175)
    at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:235)
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:561)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:117)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:56)
    at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:137)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:565)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:545)
    at java.lang.Thread.run(Thread.java:745)
]]

Line 370 is cvv.setIdentifier(values[3]);. Here: https://github.com/IQSS/dataverse/blob/9ae1a6432592499b8ff7304ae76301d08f03bac6/src/main/java/edu/harvard/iq/dataverse/api/DatasetFieldServiceApi.java#L370

As I mentioned to @scolapasta yesterday, the change in 76496aa (the version of the tsv I'm trying to load) seems to affect basically the entire tsv file. It's a much bigger change than the earlier commit at df60357 which seem to only change controlled vocabulary values.

In short, I think there's something wrong with the latest version of the tsv file. The code that parses this tsv file is picky and I don't know much about it. @scolapasta was the original author and @sekmiller added the feature to re-load an updated tsv file (I'm not sure which issue that was).

@posixeleni knows a lot about these tsv files as well. Again, I'm pretty sure we need a new one. One that doesn't cause the code to throw exceptions.

pdurbin commented 9 years ago

Line 370 is cvv.setIdentifier(values[3])

I spoke with @sekmiller and he indicated that the concept of an identifier for a controlled vocabulary value was added after 4.0. Judging from #947 it was added in 4.0.1 by @scolapasta and @posixeleni .

We were speculating that perhaps the problem is that the "identifier" column was empty but after uploading the version from 4.1 to Google Docs, I don't think that's the problem because the "identifier" column was empty back in 4.1 too:

customgsd-4 1 tsv_-_google_sheets_-_2015-09-18_10 11 00

That screenshot comes from here (4.1 version of the GSD block): https://docs.google.com/spreadsheets/d/1xQ8wi1-2NqylgzROf72A64ojrpQJAJTHdFe3mRzPHN0/edit?usp=sharing

In addition the "journals" metadata block in 4.1 didn't have "identifier" filled in:

dataverse_journals tsv_at_v4 1_ _iqss_dataverse_-_2015-09-18_10 18 43

So I'm pretty sure "identifier" is optional.

pdurbin commented 9 years ago

I spoke with @scolapasta and he indicated I should give this issue to @posixeleni to review the tsv file that is failing to import.

@posixeleni here's the file: https://github.com/IQSS/dataverse/blob/76496aa9593736d7846e1aa0222fe229198762d5/scripts/api/data/metadatablocks/customGSD.tsv

If there's something obvious you can fix, please feel free to push to the branch we've been using: https://github.com/IQSS/dataverse/commits/2310-GSD

If it helps, I think the line with "01321" may be a problem. It's different than the surrounding lines:

murphy:dataverse pdurbin$ grep 01321 customGSD.tsv -C3
    gsdCourseName   01317       39                                          
    gsdCourseName   01318       40                                          
    gsdCourseName   01319       41                                          
        01321                                                   
    gsdCourseName   01401       42                                          
    gsdCourseName   01401       43                                          
    gsdCourseName   01402       44                                          

This is just a theory though...

posixeleni commented 9 years ago

@pdurbin @sbarbosadataverse there's a bigger issue here than just the error coming out of this tsv. I can clean that up easily but it appears that the GSD wants us to replace their Course Names with new course names which after speaking with @sekmiller our code is not currently able to do yet so we at the moment we can only add new values to the tsv's controlled vocabulary.

posixeleni commented 9 years ago

Waiting to get an ETA from @sekmiller on when we can replace controlled vocabulary values and not just add new values.

pdurbin commented 9 years ago

@sekmiller as we discussed, you're welcome to look at adding a "preview" mode while you're in that part of the code: #2551

djbrooke commented 5 years ago

I'm going to close this very old issue as I think it pertains to a custom metadata block in dataverse.harvard.edu. I contemplated bringing this into a larger metadata consolidation issue (#6030) but I'm not sure what it's about (versioning metadata blocks, maybe?).

jggautier commented 5 years ago

I think this started as an update to a custom metadata block on Harvard Dataverse where the people using the metadata block wanted to edit the terms in one of its controlled vocabularies (instead of or in addition to adding terms). Maybe because the names of faculty changed or had typos. But the process of updating the metadata blocks doesn't handle editing terms in the controlled vocabularies. So I'm guessing that when this edited GSD metadata block was being uploaded, Dataverse saw the new controlled vocabulary terms in the tsv file and said, "Hey! There are saved datasets that have terms in the Faculty Name field that aren't in this new tsv file. You can't do that."

Is that right? If so: