IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
876 stars 484 forks source link

Feature Request/Idea: Stored procedure for Identifier generation per PID provider #10514

Open PatrickKibies opened 4 months ago

PatrickKibies commented 4 months ago

Overview of the Feature Request The basic idea of this feature request is to enable dataverse to handle more than one custom PID generation style by allowing multiple database stored procedures, that can be chosen in a fashion like dataverse.pid.*.generation-styleProcedue=

What kind of user is the feature intended for? (Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin) This feature is intended for the Superuser to configure more than one custom generation style

What inspired the request? This request was inspired by the fact, that multiple PID providers can be configured and assigned to different Dataverse Collections within an instance from v6.2 on. Different generation styles are desirable as well.

What existing behavior do you want changed? Currently one can select between the default (random string) and a single stored procedure.

Any brand new behavior do you want to add to Dataverse? No. Just an extension, since from my point of view the flexibility of stored procedures is sufficient to model any usecase I can imagine rigth now, if multiple of them could be used at the same time with different PID provider configurations.

Any open or closed issues related to this feature request? None that I am aware of.

Thanks to @poikilotherm and @pdurbin for the pointer to github to bring in this issue and for telling me that mentioning @qqmyers could be helpful, which I did in this point. (See Zulip discussion.)

I am looking forward to a fruitful discussion!

qqmyers commented 4 months ago

FWIW: Seems like something that would not be too hard for someone to add. Rather than allowing calls to an ~arbitrary function, it might be better to have Dataverse pass the pidProvider id when making the request. (Granted that someone would have to be able to change the Dataverse config options to cause trouble.) If the provider id were passed, the solution would be backward compatible, e.g. Dataverse would still only accept storedProcGenerated as a value (rather than user provided text), but it would mean that there wouldn't be any nmemonic value to indicate what the stored procedure did (unless one indicated that in the label for the PidProvider itself which may be enough.)

PatrickKibies commented 4 months ago

That seems a viable approach. I see the advantages of going that way instead of allowing for countless different functions. Currently I have no spare time to find my way into Dataverse software development (we are full speed ahead to get our institutional Dataverse instance up and running, where speed is basically determined by non-techninal issues...) So from my position there is hope, that someone else finds this feature sufficiently useful (or cool :-) to implement it, or I will have a look at it, when my schedule allows for it. Thanks for your input!

poikilotherm commented 4 months ago

I would also like to propose evaluating if it's in scope to have Flyway install the stored procedures. Easier testing as well as deployment of the procedures in non-classic installations. Also allowing for a sanity check if options and method name match.

PatrickKibies commented 4 months ago

In the meantime I found a workaround for my current problem: The "shoulder" can be abused to prepend any string in front og the generated identifier since the trailing "/" which would be part of a DOI-shoulder is not appended automatically. So dataverse has everything there, what I need for my current usecase, nonetheless I think the possibility to have more than one custom generation style would still be nice to have for larger setups used by multiple institutions.

pdurbin commented 4 months ago

Ha, nice shoulder hack!