PATRIC3 / patric3_website

Legacy PATRIC Website (JBoss Portal Version)
MIT License
5 stars 2 forks source link

Publish high value data sets using Reference Workspaces #523

Closed rkenyon closed 8 years ago

rkenyon commented 9 years ago

We will collect one or more Reference Workspaces at PATRIC that are used by PATRIC curators to import, curate and publish high value omics datasets generated by either other NIAID funded projects or published in literature. PATRIC curators will collect the datasets, upload them to the reference workspace, process them using appropriate services, and publish them at PATRIC. The content of the reference workspace will be accessible to all PATRIC users. In addition, the datasets uploaded in the reference workspaces will be displayed in various contexts throughout the site to make it easier for users to find them.

Requires workspace to allow public sharing of data.

rkenyon commented 9 years ago

11/18: Per Chris still possible. He believes API calls already exist (with exception of offset), although he needs to test and send Dustin the proper syntax. His goal would be to wrap all the workspace-related requests this week. Tom and Maulik investigating data sets.

rkenyon commented 8 years ago

11/23: No notification from Chris. Dustin will ping him and cc Ron. Maulik has some example data sets that can go in once the service is working.

cshenry commented 8 years ago

Monday is probably the best I’m going to be able to do to complete this. I just have too many things on my plate now.

— Sent from Mailbox

On Tue, Nov 24, 2015 at 2:15 PM, rkenyon notifications@github.com wrote:

11/23: No notification from Chris. Dustin will ping him and cc Ron. Maulik has some example data sets that can go in once the service is working.

Reply to this email directly or view it on GitHub: https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159392398

rkenyon commented 8 years ago

Thanks for the update.

On 11/24/15 10:35 PM, cshenry wrote:

Monday is probably the best I’m going to be able to do to complete this. I just have too many things on my plate now.

— Sent from Mailbox

On Tue, Nov 24, 2015 at 2:15 PM, rkenyon notifications@github.com wrote:

11/23: No notification from Chris. Dustin will ping him and cc Ron. Maulik has some example data sets that can go in once the service is

working.

Reply to this email directly or view it on GitHub:

https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159392398

— Reply to this email directly or view it on GitHub https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159478493.Web Bug from https://github.com/notifications/beacon/ADCnWm38_zE_V-_JIvrHc3vD23TUECkTks5pJSPngaJpZM4GO8WB.gif

Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Virginia Bioinformatics Institute Virginia Tech rkenyon@vbi.vt.edu

rkenyon commented 8 years ago

Hi Chris,

Were you able to get to this?

Thanks, Ron

On 11/24/15 10:36 PM, Ron Kenyon wrote:

Thanks for the update.

On 11/24/15 10:35 PM, cshenry wrote:

Monday is probably the best I’m going to be able to do to complete this. I just have too many things on my plate now.

— Sent from Mailbox

On Tue, Nov 24, 2015 at 2:15 PM, rkenyon notifications@github.com wrote:

11/23: No notification from Chris. Dustin will ping him and cc Ron. Maulik has some example data sets that can go in once the service is

working.

Reply to this email directly or view it on GitHub:

https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159392398

— Reply to this email directly or view it on GitHub https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159478493.Web Bug from https://github.com/notifications/beacon/ADCnWm38_zE_V-_JIvrHc3vD23TUECkTks5pJSPngaJpZM4GO8WB.gif

Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Virginia Bioinformatics Institute Virginia Tech rkenyon@vbi.vt.edu

Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Virginia Bioinformatics Institute Virginia Tech rkenyon@vbi.vt.edu

cshenry commented 8 years ago

Workspace publication is in fact already done and deployed in production with full passing tests.

BTW, I am still allowing workspace owners to “unpublish" their own workspaces at this time. Argue with me about it if you want, but with this being a new released feature, I think we’re going to see people trying it… realizing what it is… and wanting to “undo it”. Generally, I question the idea of locking poeple completely out of their own workspaces. If poeple want to unpublish their own data, then let them deal with the consequences of this decision and let us not interpose ourselves into the issue.

I’m now working on the example script giving Dustin the queries he wants. The queries are all doable today. I’m just giving dustin the syntax.

The only outstanding thing that dustin wanted in the workspace that isn’t done yet is the offsets (partial listing of hits on workspace queries). I think I need to make a call and say this isn’t going to get done on this release. It’s not a big deal to implement, but I have alot of work to do on alot of projects, and I see no reason to postpone critical work in other areas to implement offsets in the workspace that are completely unnecessary at this time (we are hardley drowning in so many workspaces as to need offsets in listing them). This will change over time as the publication capability is exercised, and so I do recommend we slate offsets for the next release… but I need to prioritize right now.

— Sent from Mailbox

On Wed, Dec 2, 2015 at 12:57 PM, rkenyon notifications@github.com wrote:

Hi Chris, Were you able to get to this? Thanks, Ron On 11/24/15 10:36 PM, Ron Kenyon wrote:

Thanks for the update.

On 11/24/15 10:35 PM, cshenry wrote:

Monday is probably the best I’m going to be able to do to complete this. I just have too many things on my plate now.

— Sent from Mailbox

On Tue, Nov 24, 2015 at 2:15 PM, rkenyon notifications@github.com wrote:

11/23: No notification from Chris. Dustin will ping him and cc Ron. Maulik has some example data sets that can go in once the service is

working.

Reply to this email directly or view it on GitHub:

https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159392398

— Reply to this email directly or view it on GitHub https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-159478493.Web Bug from https://github.com/notifications/beacon/ADCnWm38_zE_V-_JIvrHc3vD23TUECkTks5pJSPngaJpZM4GO8WB.gif

Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Virginia Bioinformatics Institute Virginia Tech

rkenyon@vbi.vt.edu

Ron Kenyon PATRIC Project Manager, patricbrc.org Project Director, Virginia Bioinformatics Institute Virginia Tech

rkenyon@vbi.vt.edu

Reply to this email directly or view it on GitHub: https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-161397890

rkenyon commented 8 years ago

12/2: The capability is there. We can set up the workspace and manually make it public on the back end. Maulik can go ahead create a workspace and upload the data sets we have for this to show - Dustin can help get it set up. Maulik will send Dustin the names of the workspace and Dustin set it up. Will have a short meeting with Maulik, Dustin, Andrew, Tom, and Ron to load the data and get agreement on how it will work for this release.

cshenry commented 8 years ago

Dustin,

Here is a script demonstrating all the workspace queries you were interested in:

https://github.com/ModelSEED/ResearchScripts/blob/master/Patric/WS_query_demonstration.pl

I’ve tested these and they are working. Generally, MongoDB query syntax will work now.

Chris

— Sent from Mailbox

On Wed, Dec 2, 2015 at 2:20 PM, rkenyon notifications@github.com wrote:

12/2: The capability is there. We can set up the workspace and manually make it public on the back end.

Reply to this email directly or view it on GitHub: https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-161421723

cshenry commented 8 years ago

Thanks Chris. I'll take a look.

Dustin

On Dec 2, 2015, at 6:44 PM, Christopher Henry wrote:

Dustin,

Here is a script demonstrating all the workspace queries you were interested in: https://github.com/ModelSEED/ResearchScripts/blob/master/Patric/WS_query_demonstration.pl

I�ve tested these and they are working. Generally, MongoDB query syntax will work now.

Chris

� Sent from Mailbox

On Wed, Dec 2, 2015 at 2:20 PM, rkenyon notifications@github.com wrote:

12/2: The capability is there. We can set up the workspace and manually make it public on the back end.

� Reply to this email directly or view it on GitHub.

rkenyon commented 8 years ago

02/24: Chris working on. Will require some minor UI updates.

olsonanl commented 8 years ago

The current problem with this involves the download of workspace files that live in Shock. Public ws requests come in without a user token, so no token is available to be supplied to shock for the download. It is unclear if we can enable the download with no token.

My current thinking is that we add the "wsauth" token to the access list for public workspaces, and make that token available to the download service.

Chris, it looks like the public workspace access was enabled by changing auth from required to optional; did you also add the appropriate ownership checks so that we still have proper protection of private data? It wasn't clear that such changes were added but perhaps they are implicit in other checks in your code.

dmachi commented 8 years ago

According to the Shock docs, it should be possible to set a publicly readable node, though I haven’t found an example of how to do that. I’m assuming that it is setting the ACL for the node to “read” for “everyone”. I did see this note, so this could also be disabled by configuration on our shock server:

"NOTE: Although a node may be designated as publicly readable, writable, or deletable, user authentication may still be required to perform the operation depending on the Shock server's configuration.”

I’m reading the docs from https://github.com/MG-RAST/Shock/wiki/API https://github.com/MG-RAST/Shock/wiki/API

Dustin

On Mar 28, 2016, at 1:49 PM, olsonanl notifications@github.com wrote:

The current problem with this involves the download of workspace files that live in Shock. Public ws requests come in without a user token, so no token is available to be supplied to shock for the download. It is unclear if we can enable the download with no token.

My current thinking is that we add the "wsauth" token to the access list for public workspaces, and make that token available to the download service.

Chris, it looks like the public workspace access was enabled by changing auth from required to optional; did you also add the appropriate ownership checks so that we still have proper protection of private data? It wasn't clear that such changes were added but perhaps they are implicit in other checks in your code.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/PATRIC3/patric3_website/issues/523#issuecomment-202504512

olsonanl commented 8 years ago

The problem is that our Shock server is too old to have the acl support.

I am looking into working around the problem and will raise the issue of upgrading.

olsonanl commented 8 years ago

OK, I think this may be good now.

The workaround is in and works by checking for a token on the creation of a download URL; if we do not have one that means we have a public workspace item (that I believe should have been validated as available for public download). If this item is a shock file, we grant read access to the "wsauth" user (currently the reviewer user) and set the wsauth token to be the one for the download service to use to access the file. I verified the xlsx file in the workspace

https://www.beta.patricbrc.org/workspace/PATRIC@patricbrc.org/home/Special%20Collections/NIAID%20Systems%20Biology%20Centers/Omics4TB

works both when I am logged in and logged out.

rkenyon commented 8 years ago

3/28: Almost fixed. About 80% done.

rkenyon commented 8 years ago

3/29: Resolved. Need to update Omics4TB landing page and have pointer to this workspace.

rkenyon commented 8 years ago

3/31: Not clear now what data is in the workspace that Maulik picked up. It does not seem to match what Serdar from Omics4TB sent. For now, just including the information Serdar sent in the Omics4TB landing page: http://enews.patricbrc.org/omics4tb/. Ron will check with Maulik upon his return from vacation.

mshukla1 commented 8 years ago

We have already picked up couple of gene expression datasets from Omics4TB and CCFA and published them through the public workspace.