eprints / orcid_support_advance

ORCID Support Advance plugin
1 stars 6 forks source link

New features #19

Closed dennmuel closed 6 years ago

dennmuel commented 6 years ago

Summary

Some new features, new metadata fields and reworked workflows.

Smaller enhancements include:

A new/enhanced way to store put codes and the orcid they belong to enables us to add

Reworked export (switching from bulk export to regular export) enables

Fixes at least parts of #6 , #12 , #13 , #14 and some comments from the mailinglist. @wfyson could you please review these changes? Any feedback, help and questions are welcome!

Drawbacks

Currently there is no way to catch errors that might occur from using prior versions of this plugin. It's not possible to get and save the putcode of items that have been posted to orcid.org without saving the putcode. There is no function, that ports existing data from eprint_orcid_put_codes to the new field eprint_creators_putcodes.

Exporting 100+ items has not been tested much; there might be bugs left. Also, having lots of items (in our case 550+ items) can make the processing really slow. So a spinner or progress bar would make for a much nicer user experience. Faster processing would be better, of course.

Import of contributor data could not be tested.

Saving the putcode in the creators table does not work, when you import items, that don't have contributors. If the item does not have further external IDs, the putcode will fail to serve as a duplicate criteria.

If the xml response from the ORCID API is not valid, the parser goes belly up. This is not catched and it shouldn't because the correct way to fix this would be a valid xml response.

Screenshots

Active duplicate filter in import rendering filtered elements. The css classes for filtered elements (duplicate and date) can be adjusted if you don't want them to be paled out. Also contains an example of subtitles now being displayed.

grafik


Active date filter in export rendering filtered elements. The duplicate filter in export just doesn't show the respective items at all. If they are displayed, they appear like normal records, because they are now updatable thanks to the saved putcode and enabled PUT-requests to the ORCID-API.

grafik


Better user feedback on export

grafik

In case of errors, the XML response from the API is logged (e.g. in apache error log). grafik


Department exported to ORCID record

grafik

wfyson commented 6 years ago

Hi @dennmuel, Thanks for this and for your detailed documentation. I''ll have a look over it all and review it and get back to you as soon as possible.

wfyson commented 6 years ago

Hi @dennmuel,

Following on from the conversation at https://github.com/eprints/orcid_support_advance/issues/12 I'd be more inclined to add the put-code as an extension to the creators table and remove put-code as it's own special field. The problem with the new approach is that we end up storing all the ORCIDs twice and we introduce two new tables to the database. Adding the put-code to the creators table, still includes adding a new table (for creators_put_code and for any other fields that may use it, e.g. editors, contributors), but at least we don't store the ORCID again.

Therefore I'd suggest we could add it in the same way the ORCID Support plugin adds the ORCID field, e.g.

#add put-code as a subfield to appropriate eprint fields
foreach my $field( @{$c->{fields}->{eprint}} )
{
        if( grep { $field->{name} eq $_ } @{$c->{orcid}->{eprint_fields}}) #$c->{orcid}->{eprint_fields} defined in z_orcid_support.pl
        {
                @{$field->{fields}} = (@{$field->{fields}}, (
                        {
                                sub_name => 'put-code',
                                type => 'text',
                                allow_null => 1,
                                show_in_html => 0, #we don't need this field to appear in the workflow
                                export_as_xml => 0, #nor do we want it appearing in exports
                        }
                ));
        }       
}

Let me know what you think or if there's any reason why you think the above is a bad idea. I'm mostly just keen on not duplicating the ORCIDs and avoiding the need to keep the creators orcids and the put-code orcids in sync.

dennmuel commented 6 years ago

Hey @wfyson , thanks for taking this on. I was just trying this approach since you were hesistant to add new subfields to the creators table in the discussion you mentioned. At the top of my head I wouldn't know a reason, why we shouldn't store it in the way you proposed. (Actually I'm relying on your expertise here, so whatever you think works best should be cool with me.)

dennmuel commented 6 years ago

Hey @wfyson , I reworked the putcode saving and updated the description of this PR.

I came across one drawback of the creators table approach: When you import items, that don't have contributors (which can only be exported to ORCID via the API!) the putcode cannot be saved. If the item does not have further external IDs, there is no way to identify a duplicate record. Any idea on how to catch that? Or can we just tolerate this?

We have a national holiday in Germany tomorrow and actually I'd be on vacation the next one and a half weeks. But since we would like to roll out the plugin in two weeks, I'd come in to work on the plugin, if you would happen to find the time to continue reviewing soon.

wfyson commented 6 years ago

Hi @dennmuel, Thanks for updating the PR and sorry to hear that you're having to work on this during your holidays. I can continue to review when possible, but may not be able to feedback on everything for your deadline. If not I'd suggest you go ahead and get done what you need for your repository and we can work out how to merge this in to the general Bazaar package at a later date.

Regarding your drawback of this put-code approach mentioned above, I'm a bit confused about how you might be importing items from ORCID that don't have any creators? Do they not all have to have at least one creator as they come from some individual's ORCID profile?

Thanks for all your work on this - it is much appreciated and apologies I can't be more helpful at this stage.

dennmuel commented 6 years ago

Hey @wfyson, thanks for your quick reply. Would be ideal, if we could review this before our deadline, but surely I don't want to put any pressure on you here. I would just like to make sure, that we start using an extended version of the plugin, that doesn't do fundamental things fundamentally different than the official release. Just trying to avoid lots of trouble down the line. So if we can achieve that, I would be more than grateful. :)

Now, back to work:

... how you might be importing items from ORCID that don't have any creators? Do they not all have to have at least one creator as they come from some individual's ORCID profile?

I thought so, too, but as it turns out and is stated (here):

Contributors can only be added using the ORCID API -- a researcher cannot self-assert their contribution.

This means, if you add an item to your ORCID record either manually or via BIBTEX import, there are no contributors listed in this item. (Unfortunately I could not succesfully claim a publication using the "Search & Link" functionality; but I'd guess it would be same as with the other methods.) I tried it with a record from our repository. When exported using the plugin, the contributors are there. When exported to BIBTEX and then imported to ORCID, there are no contributors listed. In the form for manually adding a work, there isn't even a field for contributors. I guess this is to prevent false claiming of publications.

Hopefully I could clarify the problem I see here. Maybe it's not so bad, because most items (should) have another identifier to prevent duplicates. But at least you must be extra careful as an editor, if you have a work without IDs and putcode.

wfyson commented 6 years ago

Hi @dennmuel Thanks for clearing up the issue about no contributors being available for import. I suppose to get around this we could simply assume that the user whose ORCID we're using for the import can at least be added to the creators field for the EPrint record and so we have a row where we can save the put-code.

However, I think a new problem has arisen concerning storing the put-codes in the creators field. Due to orcid.org's requirements of making the ORCID a readonly field and with the put-code field hidden via CSS, if we did want to edit a record and remove or change one of the creators for example, the old put-code would still be present. We get around this problem for the readonly ORCID field by having it perform a lookup against the user's profile, but no such option exists for the put-codes. Therefore I think we need a pre-commit trigger that detects if any of the creators ORCIDs have changed and if so uses the put-code to tell orcid.org that the record should no longer be associated with the user's orcid.org profile and then wipes that put-code from the eprint record.

Does that sound sensible to you? Let me know if any of the above doesn't make sense and if there's anything from the EPrints side of things I can help you with.

dennmuel commented 6 years ago

Adding the importing user as an author if there are no contributors makes sense to me. I'll work around this that way then.

I see your point regarding changing author information. Your solution sounds doable, allthough I wouldn't know how to implement the trigger. Just a quick idea: Couldn't we save all the putcodes associated with a user in his profile and then do a lookup? We would need to save the putcode twice, but after all the orcid is also saved once for the creator and once for the user. (Not sure if that makes sense.)

If you see any other major problems with the new features, please let me know so I might be able to tackle them all in one or two days of work. The putcode seems to be the most crucial though.

wfyson commented 6 years ago

I think the problem with saving all the putcodes associated with a user in their profile is that not all repositories have user accounts for all their researchers (e.g. if a third party CRIS-like system pushes records into the repository) or if someone deposits on someone else's behalf. It would also add to the complexity of the repository in trying to keep the list of put-codes in the user profile in sync with any changes made to the eprint records (e.g. when records are changed or deleted, etc.).

With regards to implementing a trigger, I suspect we can actually do this within the existing two EP_TRIGGER_BEFORE_COMMIT triggers found in z_orcid_support_advance.pl. These triggers loop through the creators and editors field, updating the ORCID field based on the lookup to the user profile. If we detect when an ORCID gets removed we can use the put-code associated with that creator/editor entry to update orcid.org, before then removing the put-code too. Ideally, these two triggers would become just one trigger that loops through all ORCID relevant fields using the @{$c->{orcid}->{eprint_fields}} as used earlier when adding the put-code field (apologies - I should have done that the first time I was looking at this stuff!).

dennmuel commented 6 years ago

Okay, I see. Thanks for clarifying. I often don't have a bigger picture than the one of just our installation, so all your comments are very helpful and much appreciated.

If there are no other crucial points, I'll try to work on the following and come back to by next week.

Did I get all that right or did I miss something? Thanks for all your help with this!

wfyson commented 6 years ago

That all sounds good to me - thanks so much for working on this and sorry I've not been able to contribute any code on these changes so far. If you come across any issues then please ask though and I'll try and help where I can!

dennmuel commented 6 years ago

Hey @wfyson , I have a question regarding the putcode and trigger stuff. As far as I understand, the existing triggers take the new/edited creators (or editors) and manipulate them. However, for our purposes we would need to also have the old/unedited creators, so we can compare these two arrays. Is there a way to get the creators as they were, before somone edited the eprint?

wfyson commented 6 years ago

Hi @dennmuel , The $changed variable should contain key/value pairs of all the fields that have been changed. So $changed->{creators} should return an array of all the creators before the edit. We can then poke the orcid profile of anyone who is in that array, but not in the current one.

dennmuel commented 6 years ago

Hi @wfyson,

thanks for your help. I updated the code to add the user as a contributor if there are no others.

I also played around with a simple trigger, but for the creators only so far. Could you have a look please? If everythings fine, we just copy it for the editors. I don't feel comfortable about merging these triggers with the existing (but commented out) triggers. So maybe reworking that area in general would be better some other time. Also my colleague and I were having second thoughts about deleting items on orcid.org, so we didn't implement this yet. We're most probably not going to need that feature, when we kick off next week, so this can wait from our perspective.

Gonna take on hiding the workflow fields next week. For now, if the road for the putcode stuff is more or less save, I'd go back to my holidays. :)

wfyson commented 6 years ago

Hi @dennmuel, Thanks for all your effort on this and I'll review it and get back to you as soon as possible... in the meantime, enjoy your holidays!

dennmuel commented 6 years ago

Added some javascript to hide the putcode field. Doesn't affect the details view, though. There are no element IDs I could work with.

grafik

wfyson commented 6 years ago

Hi @dennmuel I've been having a look at this and come across a probem. Firstly, my previous advice about using '$changed->{creators}' is wrong - it looks like the $changed hash uses sub fields too, so what we really need in the pre-commit trigger is $changed->{creators_orcid}.

When I put that change in place, if I then edit the creators record (i.e. remove a creator or change their email address) their ORCID is removed too (as expected) and so is their put-code. But then if I go back and re-add that creator (and their ORCID) and try to export the record to orcid.org again, I get a 409 error from ORCID telling me that the record is already present based on external identifiers.

I see in your comments in ExportToOrcid.pm's 'action_export' that ORCID don't currently return the put-code with a 409, but are planning on doing so in future. I think until they do we can't get around the issue. The circumstances described above to create the problem are perhaps a little obscure so maybe we can leave this issue in for now? Just wanted to raise your attention it in case you knew of a quick fix! If we know that they're going to start including it soon, maybe we could merge this pull request in now, but hold back on updating the Bazaar plugin until that bit's ready?

dennmuel commented 6 years ago

Hi @wfyson ,

thanks for taking a look and improving this. I see the problem with that. This particular circumstance might be a little obscure, but the same 409 error is also given, when the export function has been used prior to the plugin saving the putcode in export. So the resulting scenario of having an item POSTed to orcid.org but not having a putcode for a PUT request would be quite common, I fear.

As already stated, one fix would be to have the API return the putcode in this error case, try a PUT request and if successful save the putcode.

My colleague and I came up with an alternate idea: Read all the ORCID works of this user, check for a match (using external identifiers doi, urn or at least going through all ORCID works that have this eprints instance as a source and look for matching titles or something), get the putcode, try a PUT request with the putcode and if that request is successful, save the putcode to the creators table.

Would that be a practical solution?

wfyson commented 6 years ago

I did think that might be one approach, but it seems like a lot of work for something for which there'll be an easier solution (hopefully) quite soon. Maybe implementing that as a one off script that could update existing records would help fix the issue of records that have been exported prior to the plugin update?

dennmuel commented 6 years ago

That sounds reasonable. Should this perhaps be added to orcid_support_advance.pl so that this kicks in right after the putcode subfield is added?

ExportToOrcid.pm will need some enhancements in error handling anyway. We've come across new problem with the export, too. When you import an item (putcode gets saved), modify it and then try to export it (using a PUT request because we have a putcode), the API says, that we cannot do this, because we are not the source of this item (which is correct). In that case, we would need to retry with a POST request, so that the two versions of the item (the one imported from orcid and the one exported to orcid) both end up beeing visible in one grouping like this:

grafik

wfyson commented 6 years ago

I don't think we can add the script to orcid_support_advance.pl as it's something that only needs to be run the once after this initial update. If it's in orcid_support_advance.pl we'd need to add a bit of logic to work out whether or not we need to run the script. Whilst it's a bit inconvenient it may be best to simply have it as a script in bin/ which a developer can run as part of the installation process (which would be equivalent to running epadmin update to add the new put-code field for example).

Thanks for alerting me to the other issue too. It seems like we need some sort of provenance solution to store the source when importing the record so that we can catch this when exporting. However maybe an easier solution would be that in the event of a PUT request returning an 9010 (which I believe is the appropriate error message) we simply try again with a POST?

dennmuel commented 6 years ago

Agreed, a seperate file would be sufficient then.

EDIT: The logic needed wouldn't be far off, though: If there is no putcode subfield, go ahead and add it. If that was successfull, try to find the putcodes of previously posted items. Checking for an existing putcode subfield would be good to have either way, wouldn't it?

[...] solution would be that in the event of a PUT request returning an 9010 (which I believe is the appropriate error message) we simply try again with a POST

Yes, that was what I meant in my previous comment when talking about the "need to retry with a POST request". Provenance information for the putcode would probably have to be stored in the creators table. I don't think we want that, so I didn't even mention it before.

wfyson commented 6 years ago

Hi @dennmuel, I merged the pull request and started working on some of the issues above on a separate branch. I think the second issue described above, where a record needs POST-ing after it was orignally imported from orcid.org should be fixed with commit https://github.com/eprints/orcid_support_advance/commit/f698ab09fbd6f6d36e15408b26fab351f0d0855d, along with a separate issue that arose when the user tries to update a work which has since been removed from orcid.org. If you could check it over that would be great!

I should add that re-POST-ing a work previously imported from orcid.org will only show that work as having two sources as per your screenshot above, if orcid.org can identify the works are the same based on an external identifier, e.g. a DOI. Otherwise it will simply create a new work.

wfyson commented 6 years ago

Hi @dennmuel ,

I've made some more changes to the plugin over on the mannheim branch at https://github.com/eprints/orcid_support_advance/tree/mannheim which I believe should resolve all of the issues we encountered above. If we don't have a put-code when exporting and we can't POST, then the plugin will try to retreive the correct put-code for the work (from the correct source for the work too) and then will try and export again with a PUT.

Similarly if we try to export a previously imported record, which we can't PUT due to being a different source, the plugin will try again with a POST to add the work as a new source.

If you are in a position to try out and test my changes that would be great, before I publish them more publicly on the Bazaar. Thanks for all your help!

(And apologies for continuing the conversation on this closed pull request - it seems the best place to contact you however!)

dennmuel commented 6 years ago

Hey @wfyson, thank you very much for all the work! Sorry for not answering sooner; unfortunately I was taken sick. From the first glance your work looks good, but I haven't tested it yet. I'll come back to you, as soon as I have done that. Best regards

dennmuel commented 6 years ago

Hey @wfyson, quick tests seem like everything works as it should. Thanks very much for your work on this!