PATRIC3 / patric3_website

Legacy PATRIC Website (JBoss Portal Version)
MIT License
5 stars 2 forks source link

download for private genomes should be prioritized #1470

Open jimdavis1 opened 7 years ago

jimdavis1 commented 7 years ago

I am pasting an email conversation from Aaron best, our close collaborator and #4 patric user. Basically he discovered that you cant download private genome files the same way as public ones because that function touches the ftp instead of solr. This should be prioritized as a thing to fix.

Okay. I’ll take a look. This seems like a function that should be working on the website. :-)

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:27 PM, James Davis jjdavis.phd@gmail.com wrote:

you should be able to structure that url command the same way for the data you need. You would need to do a p3-login first though.

On Apr 3, 2017, at 3:26 PM, Aaron Best best@hope.edu wrote:

p3 script?

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:26 PM, Aaron Best best@hope.edu wrote:

Yes… private genomes. Is there a way to get those files then? curl?

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:25 PM, James Davis jjdavis.phd@gmail.com wrote:

Hi Aaron, I just asked around and they guys say that this isn’t supported on user genomes yet because it’s currently touching the FTP site. Was this in fact a private genome? Thanks, Jim

On Apr 3, 2017, at 3:10 PM, Aaron Best best@hope.edu wrote:

Hi Jim,

A collaborator and I are having issues downloading genomes from PATRIC. We can get a zip file downloaded, but then it will not unzip (on a mac via the normal utilities, on a PC, or via the command line). Do you know of any open issues with downloading?

The specific problem is this:

From genome list view, select a genome, click the download button on right side green bar, select more options, select format (e.g. fasta, genbank), download

The .zip version becomes a .cpgz file on mac. If you unzip that, it becomes a .zip file again. The .tgz version gives an error message 'The archive “PATRIC_Export.tgz” is empty!’

Can you check into this for us?

Thanks,

Aaron

dmachi commented 7 years ago

Jim, please do let him know that for any private genomes all of those files are already in his workspace inside the job result and he can download them from there until the primary download supports that functionality.

On Apr 3, 2017, at 5:31 PM, jimdavis1 notifications@github.com wrote:

I am pasting an email conversation from Aaron best, our close collaborator and #4 https://github.com/PATRIC3/patric3_website/issues/4 patric user. Basically he discovered that you cant download private genome files the same way as public ones because that function touches the ftp instead of solr. This should be prioritized as a thing to fix.

Okay. I’ll take a look. This seems like a function that should be working on the website. :-)

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu mailto:best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:27 PM, James Davis jjdavis.phd@gmail.com mailto:jjdavis.phd@gmail.com wrote:

you should be able to structure that url command the same way for the data you need. You would need to do a p3-login first though.

On Apr 3, 2017, at 3:26 PM, Aaron Best best@hope.edu mailto:best@hope.edu wrote:

p3 script?

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu mailto:best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:26 PM, Aaron Best best@hope.edu mailto:best@hope.edu wrote:

Yes… private genomes. Is there a way to get those files then? curl?

Aaron Best, Ph.D. Harrison C. and Mary L. Visscher Professor of Genetics Department of Biology

Hope College A. Paul Schaap Science Center, Room 3015 35 East 12th Street, Holland, MI 49423 office 616.395.7376 | best@hope.edu mailto:best@hope.edu | @aaron_best hope.edu/academic/biology | hope.edu

On Apr 3, 2017, at 4:25 PM, James Davis jjdavis.phd@gmail.com mailto:jjdavis.phd@gmail.com wrote:

Hi Aaron, I just asked around and they guys say that this isn’t supported on user genomes yet because it’s currently touching the FTP site. Was this in fact a private genome? Thanks, Jim

On Apr 3, 2017, at 3:10 PM, Aaron Best best@hope.edu mailto:best@hope.edu wrote:

Hi Jim,

A collaborator and I are having issues downloading genomes from PATRIC. We can get a zip file downloaded, but then it will not unzip (on a mac via the normal utilities, on a PC, or via the command line). Do you know of any open issues with downloading?

The specific problem is this:

From genome list view, select a genome, click the download button on right side green bar, select more options, select format (e.g. fasta, genbank), download

The .zip version becomes a .cpgz file on mac. If you unzip that, it becomes a .zip file again. The .tgz version gives an error message 'The archive “PATRIC_Export.tgz” is empty!’

Can you check into this for us?

Thanks,

Aaron

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470, or mute the thread https://github.com/notifications/unsubscribe-auth/AALobhqzDFE7BjNOFUdGKafDsngb6idfks5rsWU2gaJpZM4MyHnT.

mshukla1 commented 7 years ago

Bob, lets try to solve this using the same mechanism you implemented for BLAST against public/private genomes.

I have script for generating the download files for public genomes. We can use it to create download files for private genomes and place them in a private folder on the /vol/patric3. The download service can combine public+private files and pass them back to browser.

-Maulik

hyoo commented 7 years ago

well, i was thinking to generate files on the fly inside data api. Formatting shouldn't be difficult. Since we're generating FASTA on the fly, this is possible solution. Just a matter of time format accordingly for multiple formats. @dmachi what do you think?

dmachi commented 7 years ago

Most are easy to serialize that way, but some aren’t. For example, if i understand correctly, the gen bank files have data that would need to be pulled from multiple solr cores, so its more than just serializing the results of any one core like fasta does. This would be the ideal way though.

On Sep 21, 2017, at 4:44 PM, hyoo notifications@github.com wrote:

well, i was thinking to generate files on the fly inside data api. Formatting shouldn't be difficult. Since we're generating FASTA on the fly, this is possible solution. Just a matter of time format accordingly for multiple formats. @dmachi https://github.com/dmachi what do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331276252, or mute the thread https://github.com/notifications/unsubscribe-auth/AALobgKuDgmmHQhDCy7vKhm3_6DtQgOgks5sksqwgaJpZM4MyHnT.

olsonanl commented 7 years ago

If you commit that script we can add it to the suite of exporters run when a genome is processed and save the data in the workspace. Then the download can redirect to that file.

What precisely is the current mechanism for download?

On Sep 21, 2017, at 3:41 PM, Maulik Shukla notifications@github.com<mailto:notifications@github.com> wrote:

Bob, lets try to solve this using the same mechanism you implemented for BLAST against public/private genomes.

I have script for generating the download files for public genomes. We can use it to create download files for private genomes and place them in a private folder on the /vol/patric3. The download service can combine public+private files and pass them back to browser.

-Maulik

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331275459, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACDXIlD636PHyUMvPJvAzPq5AhoLQMeJks5sksn5gaJpZM4MyHnT.

dmachi commented 7 years ago

When you are using the advanced download tool (in the screenshot), it is download multiple genomes simultaneously. Currently you can choose any of the file types below to be included in your download. It only works for public genomes because it uses the files generated for ftp to stream these into a zip and stream that zip archive back out. All of the other downloaders in the site do streamed serialization from the data in the solr. Since we don't have serializers for these pieces we are using the generated files as the source.

On Sep 21, 2017, at 5:04 PM, olsonanl notifications@github.com wrote:

If you commit that script we can add it to the suite of exporters run when a genome is processed and save the data in the workspace. Then the download can redirect to that file.

What precisely is the current mechanism for download?

On Sep 21, 2017, at 3:41 PM, Maulik Shukla notifications@github.com<mailto:notifications@github.com> wrote:

Bob, lets try to solve this using the same mechanism you implemented for BLAST against public/private genomes.

I have script for generating the download files for public genomes. We can use it to create download files for private genomes and place them in a private folder on the /vol/patric3. The download service can combine public+private files and pass them back to browser.

-Maulik

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331275459, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACDXIlD636PHyUMvPJvAzPq5AhoLQMeJks5sksn5gaJpZM4MyHnT.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331281491, or mute the thread https://github.com/notifications/unsubscribe-auth/AALobmxJq5ndx02isUEuhe-C4O7Uk6sOks5sks9OgaJpZM4MyHnT.

mshukla1 commented 7 years ago

Harry just mentioned that after we moved data api to Argonne, it is now gathering files from local file system /vol/patric3/ and not public ftp site...the same place where we have public and private blast files.

mshukla1 commented 7 years ago

Advantage of generating download files on the fly is that data is always fresh and we don't need to worry about synchronization when we do data updates.

Disadvantage is that, multiple users would be asking for downloads for large number of genomes and hammering the production solr server. ...which could very well be the case as more and more users start using CLI for bulk downloads.

-Maulik

dmachi commented 7 years ago

Then we can make some minor changes and then the private genomes will work via this method too assuming all the same stuff gets exported for private genomes in the same directory structure as the public genomes. Of course with this method, when genome metadata editing is available, no changes made by users will be reflected in the downloads.

On Sep 21, 2017, at 5:28 PM, Maulik Shukla notifications@github.com wrote:

Harry just mentioned that after we moved data api to Argonne, it is now gathering files from local file system /vol/patric3/ and not public ftp site...the same place where we have public and private blast files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331287276, or mute the thread https://github.com/notifications/unsubscribe-auth/AALobqoZgur7ew_dcc71VWzn5pnUVsQ7ks5sktT6gaJpZM4MyHnT.

mshukla1 commented 7 years ago

Metadata editing won’t pose a problem for now, because the only genome level data currently used in the download files are genome id and genome name and those are not editable from the website.

However, in future, when we allow users to update annotations for their genomes, we need to make sure that the corresponding download files also get updated.

-Maulik

On Sep 21, 2017, at 4:37 PM, Dustin Machi notifications@github.com wrote:

Then we can make some minor changes and then the private genomes will work via this method too assuming all the same stuff gets exported for private genomes in the same directory structure as the public genomes. Of course with this method, when genome metadata editing is available, no changes made by users will be reflected in the downloads.

On Sep 21, 2017, at 5:28 PM, Maulik Shukla notifications@github.com wrote:

Harry just mentioned that after we moved data api to Argonne, it is now gathering files from local file system /vol/patric3/ and not public ftp site...the same place where we have public and private blast files.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331287276, or mute the thread https://github.com/notifications/unsubscribe-auth/AALobqoZgur7ew_dcc71VWzn5pnUVsQ7ks5sktT6gaJpZM4MyHnT.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PATRIC3/patric3_website/issues/1470#issuecomment-331289103, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLd77a9Y_v-qVjVZBsk-5k9_6s3iKlcks5sktb9gaJpZM4MyHnT.

rkenyon commented 6 years ago

Jim triage comment: High priority.