IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 25 forks source link

restore testing dataverse #65

Closed wibeasley closed 2 years ago

wibeasley commented 3 years ago

Somehow the testing dataverse contents were deleted.

I'll

https://demo.dataverse.org/dataverse/dataverse-client-r

kuriwaki commented 3 years ago

@pdurbin Once a dataverse is published on demo.dataverse.org, I hope it won't get deleted by maintenance -- even if there is no action on it?

pdurbin commented 3 years ago

Hi, my understanding is that all data on demo.dataverse.org is deleted periodically unless it it marked to be skipped. Let's ask @djbrooke and @kcondon about the details.

djbrooke commented 3 years ago

@pdurbin thanks for the ping. Yeah, that's my understanding as well. Apologies if this created extra work here.

If the content can be re-created, let me know and I can work with @kcondon to understand more and get it on the not-to-be-deleted list.

kcondon commented 3 years ago

@djbrooke Yes, the assumption, after originally checking with the support and curation team, is that demo is typically used for short term exploration of functionality and there was no guarantee of persistence. We have limited storage space on that machine so we clear it out regularly. It was being cleared on a 30 day basis but will check whether that had been moved up to weekly. Maybe the use cases for this service has changed? I know we'd started encouraging people writing to the API to use that box to develop or verify scripts so maybe that takes longer? @pdurbin, what are the use cases here? The do not delete was used for sample data that we wanted preserved as a minimal illustration of use. Phil, I can give you access to demo if you want to add other persistent cases. We just need to be mindful of accumulating too many or of this becoming a non sustainable model with frequent updates. Perhaps we just need a different strategy?

Update: yes, it appears to still be deleting datasets older than 30 days. It does do some shorter term temp file cleanups that typically are orphaned or by products of ingest processing. Phil, it's just a cron job with several discrete cleanup actions, the last being my dataset cleanup. Leonid had taken my basic script and made it more comprehensive. However, to add a dataset to be excluded, add the PID to: finddatasetstoclean.sql . They seem to be referring to a dataverse though? I think we only removed datasets and temp files.

0 5 1 delete_older_datasets.sh select 'curl --header "X-Dataverse-key: xxxxxxx" -X DELETE /api/datasets/:persistentId/destroy?persistentId=doi:'||authority||'/'||identifier from dvobject where dtype='Dataset' and modificationtime < current_date-30 and identifier not in ('') order by indextime asc;

One last point. I believe our message on demo used to clearly indicate the data would not be preserved but now it says, This Dataverse is for demo purposes only. To archive and publish to the Harvard repository, visit dataverse.harvard.edu. To learn about and publish to other repositories, visit The Dataverse Project at dataverse.org.

djbrooke commented 3 years ago

Thanks @kcondon for the details!

@wibeasley - if you have some datasets or a dataverse that you want us to keep around, just let us know and we'll get it added to that exclusion list above. Sorry again for the inconvenience.

kuriwaki commented 3 years ago

@kcondon @djbrooke We created https://demo.dataverse.org/dataverse/dataverse-client-r and would like for any datasets in this dataverse to be kept around. No worries about any previous datasets -- this is a new one created last week.

The use case is for this dataverse to be a permanent link for our continuous integration tests as well as example code that will be widely featured on the package website. It does need to be on demo.dataverse.org. In fact, it might be more closer to real example if it had a DOI. (I wonder if what you highlighted about the demo message is the reason the DOI URLs on the demo Dataverse are not published).

pdurbin commented 3 years ago

All this sounds great. I just wanted to mentioned that @wibeasley and I had a video chat yesterday and touched on this. We very briefly went down the road of plugging dataverse-client-r as perhaps a last step in our normal "spin up an EC2 instance and test with Jenkins" process but for now probably easier to use the demo server. I said I don't think anyone will mind if the data storage requirements are small and he showed me that the test files are only a few kilobytes in size so I think we'll be fine. Thanks all for helping making regular automated testing of this code a reality!

kcondon commented 3 years ago

@kuriwaki @wibeasley All set, the two datasets have been added to the do not delete. Will pursue larger issue on use cases w/ @djbrooke separately.

kuriwaki commented 2 years ago

Thanks all - closing this, as the demo Dataverse seems stable now: https://demo.dataverse.org/dataverse/dataverse-client-r