NCEAS / metacat

Data repository software that helps researchers preserve, share, and discover data
https://knb.ecoinformatics.org/software/metacat
GNU General Public License v2.0
26 stars 12 forks source link

Activate DOI attribution when not from California Digital Library #1445

Open yvanlebras opened 4 years ago

yvanlebras commented 4 years ago

Dear metacat team,

We, in fact in fact actually @jusana ;) , are finalizing the set up of a metacat production server https://data.pndb.fr/ and want to "activate" the DOI attribution. Following metacat documentation https://knb.ecoinformatics.org/knb/docs/doi.html and after, contacting EZID https://ezid.cdlib.org/learn/doi_services_faq , it appears that we can't use EZID to attribute DOI for our metacat insrtance. Is there a way to configure metacat to directly use DOI from Datacite ? Or do we need to develop dedicated code ?

Whishing you a very good end of week,

Best,

Yvan

taojing2002 commented 4 years ago

Yvan:

Yeah, so far Metacat only assigns DOI through ezid. It is on our to-do list to use DOI through Datacite. But we haven't worked on it. Sorry for this.

Have a good weekend as well.

Regards, Jing

On 6/25/20 1:11 PM, Yvan Le Bras wrote:

Dear metacat team,

We, in fact in fact actually @jusana https://github.com/jusana ;) , are finalizing the set up of a metacat production server https://data.pndb.fr/ and want to "activate" the DOI attribution. Following metacat documentation https://knb.ecoinformatics.org/knb/docs/doi.html and after, contacting EZID https://ezid.cdlib.org/learn/doi_services_faq , it appears that we can't use EZID to attribute DOI for our metacat insrtance. Is there a way to configure metacat to directly use DOI from Datacite ? Or do we need to develop dedicated code ?

Whishing you a very good end of week,

Best,

Yvan

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCEAS/metacat/issues/1445, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5QQDHFNUJQMG6VCD5ADH3RYOVPDANCNFSM4OIWPDWA.

jusana commented 4 years ago

Hello,

Can't wait to see this feature ! In the meantime i'll see what I can do. Do you have any recommendations ?

Thank you all, Regards.

Julien

mbjones commented 4 years ago

@jusana and @yvanlebras We have wanted to support the DataCite API directly via an adapter in Metacat so that it would work with both EZID and DataCite, but as Jing said, we haven't implemented that yet. However, I would note that DataCite provides the DataCite EZID API for compatibility with EZID, and it might just be possible to connect directly to DataCite with their ezid API, which would require very few code changes on our end. I've been meaning to test this for a while, but if you wanted to that would be great. The Java ezid library that metacat uses is here: https://github.com/NCEAS/ezid

The only important difference seems to be that the DataCite implementation uses HTTP basic auth and does not support EZID's /login API. So some code changes would be required there, but probably minimal.

If you can cleanly configure it to use the DataCite ezid API and have it pass all of the test (run mvn test), then I think we could make that work in Metacat. Do you have cycles for that testing? If so, we'd welcome a pull request.

jusana commented 4 years ago

hi @mbjones , thanks a lot for your quick answer. I'll discuss that with @yvanlebras. i think we still need to check our credentials with our Datacite member partner (CNRS/INIST). To what extent or time span could you imagine to test that implementation on your side ? I am rather a python guy but i could try writing this client . Thanks again, hope to keep you informed soon. Julien

mbjones commented 4 years ago

@jusana It is not on our short list of TODOs, but if someone else put in the time to get it working and sent a pull request, we'd be happy to merge it into the codebase. It really represents a new Java ezid library release, and probably wouldn't require any changes in Metacat per se other than using that new ezid library. I think it would generally be useful to others.

yvanlebras commented 4 years ago

Thank you Jing and Matt for detailled informations. We will investigate the use of the DataCite EZID API and will keep you informed.

jusana commented 3 years ago

Hi Matt, hi all,

I finally got a DATACITE test account ;)

I tested it blindly in by setting EZID up in the metacat admin interface (metacat 2.13 , ezid 1.0.1) ... and i got DOI created but the metadata is empty:

image

with "(:unkn)" values

and metacatUI says : image

I tried to compile ezid 1.0.3 but when i insert my credentials the tests fails due to the lack of /login route in datacite implementation (" edu.ucsb.nceas.ezid.EZIDException: one-time login and session cookies not supported by this service ")

What would you suggest to manage the switch between both CDL and DAtacite in the ezid library coed ?

thanks in advance

i've just started to get into the code a few hours ago !!

mbjones commented 3 years ago

@jusana glad you were able to look into it! On the DataCite side, I wrote earlier:

The only important difference seems to be that the DataCite implementation uses HTTP basic auth and does not support EZID's /login API. So some code changes would be required there, but probably minimal.

I think what we'd need here is new logic in the java ezid library that supports alternative authentication mechanisms, so that we don't try to use /login with DataCite, and instead used HTTP Basic Auth for every request (which sends the credentials in every request in the HTTP headers). For that, we'd need a way to differentiate which service endpoints should use /login (EZID at CDL) versus Basic Auth (at DataCite). That might be simply looking at the configured service URI and switching between them.

I'll also note that the code for /login already uses BasicAuth for logging the user in (see https://github.com/NCEAS/ezid/blob/master/src/main/java/edu/ucsb/nceas/ezid/EZIDService.java#L164). The difference is that, with CDL, that initial /login generates a cookie which acts as the security token from that point forward. Whereas, with DataCite, the BasicAuth mechanism is used for every call (and no cookie is generated). So we would need to handle this difference. I would refactor login() to separate out the code that generates thr Basic Auth header, and then use that in both the EZID login() implementation and the DataCite requests to generate the Basic Auth headers. It should be pretty easy to code up cleanly.

mbjones commented 3 years ago

Oh, and also, we'd need to put in logic that ensures that we don't try to use the features of EZID that DataCite didn't implement, including (from the DataCite page):

I think only the first of these may be an issue and may require a little checking.

jusana commented 3 years ago

@mbjones , yes i understand that we need to send basicAuth headers all the time, and i agree with you logic ... "grepping" the "datacite" sub-string in the URI would be enough ... maybe there is a safer way ???

jusana commented 3 years ago

But how come i could create DOI with the ezid 1.0.1 version ??? that means i could login with my credentials, don't you think ?

Regarding the EZID missing features in datacite, it would be preventing from requesting/sending ARKs ? Do you see anything else ? the kernelv4 you're using in the datacite profile seems OK

mbjones commented 3 years ago

Grepping for the host may be fine, but it does hardcode the options. Instead, we could add a new configuration parameter to choose the authentication approach (basic auth versus login).

Alternatively, we could factor out Authentication using and Adapter pattern, and then add a configuration option to tell the library to use either a DataCiteAuth adapter or a CDLAuthAdapter, or something like that.

I'm not sure about why the 1.0.1 version worked; it should not have, and it would have to be traced carefully to see what happened.

In terms of missing features, I think you are right that missing ARK support would be the main difference. I don't see other issues.

jusana commented 3 years ago

thanks for your quick answer, i'd also like to understand why it is "working" with 1.0.1 ... and why is the "identifier not valid" ? does it expect ARK ? or maybe it does not recognize the return metadata (datacite xml, erc) returned by the datacite API

mbjones commented 3 years ago

While the EZID library can generate ARKs, the only thing we use it for in Metacat is to generate DOIs. So, I have no idea how you got it to login successfully to DataCite to create a DOI, but I think you will need to trace it carefully (probably by stepping through in a debugger) -- it would be best to do that on a test account, rather than creating real DOIs.

The EZID java library comes with a lot of test code that fully exercises the API, so I would probably start by setting up the library to test against the DataCite endpoint. The tests are currently hardcoded to use the EZID test account (see https://github.com/NCEAS/ezid/blob/master/src/test/java/edu/ucsb/nceas/ezid/test/EZIDServiceTest.java#L54), but this could be separated out into a configuration variable to repoint the tests at DataCite. Once you've reconfigured the tests to run against DataCite, you can run them with mvn test. That should pin down what is working and what is not in the connection to DataCite.

jusana commented 3 years ago

hello @mbjones , when i run the tests whith the ezid 1.0.1 tag code they all fail with the same error : "one-time login and session cookies not supported by this service" (no login route on the dacacite api ezid implementation)

but (with the same config ie metacat 2.13, ezid 1.0.1) when i use the "publish" route (via metacatUI or curl or postman) DOI are created in Datacite but i get this log: image

i wasn't not able to run the tests from the metacat code (seems it cant build, and metacat-common not found ?? with intellij)

i will try to debug a live instance

regards

jusana commented 3 years ago

the error is in D1ResourceHandler

metacat 20200929-21:23:56: [ERROR]: D1ResourceHandler: Serializing exception with code 400: The provided identifier is invalid. [edu.ucsb.nceas.metacat.restservice.D1ResourceHandler:serializeException:536]

mbjones commented 3 years ago

I am willing to bet that error in Metacat is a side effect masking the underlying failure in the EZID library. If you can't get the EZID library tests to run successfully, there is no chance that Metacat will be able to work. You should also be testing against ezid 1.0.3, which fixes several bugs already, and adds some new features. ezid 1.0.1 is obsolete. The error you got from the EZID tests is exactly what we would expect from the fact that DataCite doesn't support /login, so the code changes we discussed above would be needed for the tests to pass.

jusana commented 3 years ago

hello @mbjones ,

I kept playing with the EZID lib, and figured out a few things:

Well, what do you think ? I see another difference between the 2 vesions is that the Apache HTTPClient lib is not the same (4.2.6 vs 4.5.1)

To what extend could we imagine a PR for both versions ?

Thank you very much in advance !

mbjones commented 3 years ago

That's awesome, @jusana. We don't intend to support v1.0.1 at all any further -- it has been completely replaced by 1.0.3 (which is backwards compatible with 1.0.1). So, if you were to submit a PR against 1.0.3, then you should be able to drop in the new release anywhere that 1.0.1 was used and it should work fine. So, let's concentrate on getting 1.0.3 fixed so that it works against DataCite.

jusana commented 3 years ago

ok fine and thanks for your answer

right now the authorization header value for our credentials is hard-coded, i'll make it generic and try to submit a PR

thanks a lot !

jusana commented 3 years ago

one more thing, with 1.0.3 tag version, the metadata are still sent in the v3 schema (not v4+)

image

but for our use cases it doesn't matter (please confirm @yvanlebras )

mbjones commented 3 years ago

@jusana I think it depends. It uses v3 if you send a metadata hash. But if you attach a v4 datacite XML document in that hash using the datacite key, that gets used instead. For example, you can see how the test code in EZID does it with both datacite versions 3 and 4 here: https://github.com/NCEAS/ezid/blob/master/src/test/java/edu/ucsb/nceas/ezid/test/EZIDServiceTest.java#L179

A real world example is in Metacat here: https://github.com/NCEAS/metacat/blob/master/src/edu/ucsb/nceas/metacat/dataone/DOIService.java#L273, and

jusana commented 3 years ago

Hello @mbjones ,

I created the PR in the master branch, so please tell me what you think

For the Datacite metadata version, it seems to be a constant in v3.1 https://github.com/NCEAS/metacat/blob/d6d4f499f8a33acf0c7df10ec86be3d6d8f1c610/src/edu/ucsb/nceas/metacat/doi/datacite/DataCiteMetadataFactory.java#L71

Thanks.

mbjones commented 3 years ago

Thanks, @jusana, I'll take a look.

Regarding the constant, @taojing2002 confirmed that we are still using 3.1, but he has support for datacite 4.x in a branch here: https://github.com/NCEAS/metacat/tree/feature-datacite-relationship We will work on getting that into an upcoming release of Metacat so that DataCite 4.x can be used.

jusana commented 3 years ago

ok thanks @mbjones for the confirmation, but for our current needs v3.1 is totally fine. but nice to hear that v4+ is on its way (thanks @taojing2002)