IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 486 forks source link

TRSA (Trusted Remote Storage Agent) and variable-level metadata upload #5213

Open akio-sone opened 5 years ago

akio-sone commented 5 years ago

Related issues

Related Documents about TRSA

permission required

pdurbin commented 5 years ago

@akio-sone @jonc1438 I mentioned you over at https://github.com/IQSS/dataverse/issues/4821#issuecomment-442226108 but I wanted to highlight that for Make Data Count, we plan to only count downloads that go through Glassfish which means that downloads from TRSA, downloads from rsync, and downloads directly from Swift won't be counted.

jonc1438 commented 5 years ago

Phil

Thanks that is defiantly something we have thought about in TRSA. We for sure will need away to track any of the remote downloads or execution.

Right now TRSA is planned to just to be a registration and local management system for the remote trusted data store. The user access point will always be from Dataverse and Glassfish will know about the parts in Dataverse.

In our Impact Model once you leave Dataverse the Notary Service and the SAFE systems tracks data usage for downloads and access.

It would be good to track the count of references/forwards to outside applications in Dataverse. Where they be Two Ravens/DDI Explore/ImPact Notary service/CodeOcean etc

Jon

Jonathan Crabtree Director Cyberinfrastructure Odum Institute UNC Chapel Hill www.odum.unc.edu Jonathan_Crabtree@unc.edumailto:Jonathan_Crabtree@unc.edu 919-962-0517 Office 919-428-6112 Cell

From: Philip Durbin notifications@github.com Reply-To: IQSS/dataverse reply@reply.github.com Date: Thursday, November 29, 2018 at 5:13 AM To: IQSS/dataverse dataverse@noreply.github.com Cc: Jonathan Crabtree jonathan_crabtree@unc.edu, Mention mention@noreply.github.com Subject: Re: [IQSS/dataverse] TRSA (Trusted Remote Storage Agent) and variable-level metadata upload (#5213)

@akio-sonehttps://github.com/akio-sone @jonc1438https://github.com/jonc1438 I mentioned you over at #4821 (comment)https://github.com/IQSS/dataverse/issues/4821#issuecomment-442226108 but I wanted to highlight that for Make Data Count, we plan to only count downloads that go through Glassfish which means that downloads from TRSA, downloads from rsync, and downloads directly from Swift won't be counted.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/dataverse/issues/5213#issuecomment-442606708, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHYP4o76h4eJrDqcp_3KQms2Q3VFMhfEks5uzvx6gaJpZM4Xuh8y.

pdurbin commented 5 years ago

@jonc1438 ah, ok, so it sounds like data from a TRSA will still go through Glassfish. This is sort of like how an S3 download works. The user interacts with Glassfish so we'll count the download even if the file ultimately gets downloaded directly from S3 (if dataverse.files.s3-download-redirect is set to true). Thanks!

pdurbin commented 5 years ago

@jonc1438 @akio-sone @donsizemore great meeting this week. Would it be helpful if I stubbed out a workflow diagram similar to the one at http://guides.dataverse.org/en/4.13/admin/make-data-count.html#architecture ? I know you have a nice diagrams at http://cyberimpact.us/architecture-overview/ and http://cyberimpact.us/dataverse-trusted-remote-storage-agent-update/ but @kcondon and I were talking about a diagram that's a little lower level, about communication back and forth between the different components, like that Make Data Count diagram. Please let me know. Thanks.

djbrooke commented 5 years ago

@jonc1438 @akio-sone @donsizemore @pdurbin please coordinate this with @scolapasta. Thanks!

jonc1438 commented 5 years ago

Be happy to

Jon

Get Outlook for iOShttps://aka.ms/o0ukef


From: Philip Durbin notifications@github.com Sent: Friday, May 3, 2019 2:03 PM To: IQSS/dataverse Cc: Crabtree, Jonathan David; Mention Subject: Re: [IQSS/dataverse] TRSA (Trusted Remote Storage Agent) and variable-level metadata upload (#5213)

@jonc1438https://github.com/jonc1438 @akio-sonehttps://github.com/akio-sone @donsizemorehttps://github.com/donsizemore great meeting this week. Would it be helpful if I stubbed out a workflow diagram similar to the one at http://guides.dataverse.org/en/4.13/admin/make-data-count.html#architecture ? I know you have a nice diagrams at http://cyberimpact.us/architecture-overview/ and http://cyberimpact.us/dataverse-trusted-remote-storage-agent-update/ but @kcondonhttps://github.com/kcondon and I were talking about a diagram that's a little lower level, about communication back and forth between the different components, like that Make Data Count diagram. Please let me know. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/dataverse/issues/5213#issuecomment-489186902, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3A7YRJS2DMDN23MB7QLRLPTR45FANCNFSM4F52D4ZA.

scolapasta commented 5 years ago

@pdurbin @jonc1438 I think a diagram could help us wrap our thoughts around this for our future discussions. Once you get started, let me know, and I can help add my understanding.

pdurbin commented 5 years ago

I'm currently reviewing pull request #6068 by @akio-sone (I'm about 20% through it, reading from top to bottom) and I have a few comments and questions:

Screen Shot 2019-08-07 at 1 27 58 PM

pdurbin commented 5 years ago

I'm thinking that perhaps I should start that diagram we talked about above.

@jonc1438 @akio-sone @donsizemore I just created the following diagram when reviewing #6068 and I could use some help with it.

trsa

Here's the "source" for the diagram (.txt added to upload to this issue): trsa.uml.txt

Here's how I create a png from it:

java -jar /tmp/plantuml.jar -tpng trsa.uml

I'm basing this on what I'm seeing in pull request #6068 rather than any diagrams I've seen elsewhere. I figure we can update the diagram as more components are added. Apologies for all my misunderstanding of the various components. Please help me make corrections and please let me know if I should add this to Akio's branch.

jonc1438 commented 5 years ago

Phil,

I do think you can use this diagram but we should remember that the Pull Request was to begin the discussion into whether this is the right technical approach. We all know it is NOT ready to merge or create documentation for so some of this if premature.

The plan was to look at the technical approach and see how it aligns with your current direction or conflicts with a current approach.

I think we can call this an approach to remote data storage for Dataverse and we like TRSA. I think Gustavo defined this as storage that Dataverse does not have control over VS something mounted and usable by Dataverse. Basically Dataverse only have a reference to access points.

Let me know if that makes sense

Jon

Jonathan Crabtree Assistant Director for Cyberinfrastructure HW Odum Institute for Research in Social Science www.odum.unc.edu Jonathan_Crabtree@unc.edu 919-962-0517 Office 919-428-6112 Cell

From: Philip Durbin notifications@github.com Reply-To: IQSS/dataverse reply@reply.github.com Date: Wednesday, August 7, 2019 at 3:27 PM To: IQSS/dataverse dataverse@noreply.github.com Cc: Jonathan Crabtree jonathan_crabtree@unc.edu, Mention mention@noreply.github.com Subject: Re: [IQSS/dataverse] TRSA (Trusted Remote Storage Agent) and variable-level metadata upload (#5213)

I'm thinking that perhaps I should start that diagram we talked about above.

@jonc1438https://github.com/jonc1438 @akio-sonehttps://github.com/akio-sone @donsizemorehttps://github.com/donsizemore I just created the following diagram when reviewing #6068https://github.com/IQSS/dataverse/pull/6068 and I could use some help with it.

[trsa]https://user-images.githubusercontent.com/21006/62651641-7936ac00-b927-11e9-8da1-a32fa224d7b5.png

Here's the "source" for the diagram (.txt added to upload to this issue): trsa.uml.txthttps://github.com/IQSS/dataverse/files/3478582/trsa.uml.txt

Here's how I create a png from it:

java -jar /tmp/plantuml.jar -tpng trsa.uml

I'm basing this on what I'm seeing in pull request #6068https://github.com/IQSS/dataverse/pull/6068 rather than any diagrams I've seen elsewhere. I figure we can update the diagram as more components are added. Apologies for all my misunderstanding of the various components. Please help me make corrections and please let me know if I should add this to Akio's branch.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/IQSS/dataverse/issues/5213?email_source=notifications&email_token=AB3A7YWX732KQV3JTVKZ6VLQDMO3LA5CNFSM4F52D4ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ZOZUI#issuecomment-519236817, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB3A7YTOWA5BZJF3SHRSDBDQDMO3LANCNFSM4F52D4ZA.

pdurbin commented 4 years ago

@jonc1438 it makes sense. Thanks! 😄 To be clear, I'm not talking about writing a lot of documentation. I'm talking about some diagrams similar to the ones you've been putting on the IMPACT blog.

I recently stumbled upon https://github.com/OdumInstitute/trsa-web/blob/jee8line/src/main/resources/doc/uml-diagrams-trsa-web.puml by @akio-sone and it looks great! It creates 9 diagrams but here's the big one that's pretty much exactly what I was asking for. It's a much better version of what I was trying to do myself above without knowing all the moving pieces. 😄

uml-diagrams-trsa-web

pacian commented 2 years ago

Hello,
We are very much interested in TRSA. Can you explain where you are in this task? Denmark select Dataverse to be the TDR for the country, and at some point in time, we will be very interested in where the development is concerning TRSA.

pdurbin commented 2 years ago

@pacian hi! It looks like my last comment was in 2019. You might want to check out the talk by @jonc1438 at the 2020 community meeting: https://youtu.be/LHyiA3JeiwE?t=1466

I'll let others who are closer to the TRSA project give an update on what's new since then.

Exciting news about Denmark! Thanks!

akio-sone commented 2 years ago

@pacian @pdurbin Odum's Dataverse fork has a branch named trsa-api that has a new API endpoint to receive/save the payload of metadata from a TRSA instance without invoking the ingest. The latest update is based on version 5.10.1. Since the branch includes features that do not immediately benefit Dataverse per se, we are working with @qqmyers to sieve out essential changes from the current modifications and later merge these essential ones into the develop branch of Dataverse. As for TRSA itself, its source tree is available from Odum's github site and it is undergoing a major UI makeover, hopefully, to be committed to develop branch soon.

pacian commented 2 years ago

Thank you very much for the update. It looks like we may have several Storage Locations to be used as remote storage

pdurbin commented 1 year ago

My understanding is that this "remote storage" PR that just shipped with 5.12 should help TRSA:

From the excellent "Adding Lots of Zeros to the Size of Datafiles" talk by @qqmyers in June:

Screen Shot 2022-10-10 at 7 54 50 AM