Open rec3141 opened 9 years ago
Dear Eric, Unfortunately there is no function in place yet (like an API), which supports bulk download of project result file. We know that this would be a very sensible feature for our users and it is on our long list of 2Dos.
Yes, in the meantime you would have to script something together on your own using tools like wget, curl etc.
e.g.
curl -o OSD1_2014-06-21_0m_NPL022_reads.fasta "https://www.ebi.ac.uk/metagenomics/projects/ERP009703/samples/ERS667668/runs/ERR771106/results/sequences/versions/2.0/export?contentType=text&exportValue=processedReads"
Thanks for sharing this Excel spreadsheet with the community.
Best, Maxim Senior Software Developer - EMBL-EBI
Thanks Maxim. Is there a way to request the larger files in compressed format?
It depends on how urgent do you need them? We could either upload the compressed files on our EBI FTP server (quick thing) or you would have to wait another 2 to 3 weeks until we have put them chunked and compressed on our website. Let me know what you prefer? Best, Maxim
Hi Eric, Quick update on that. The InterProScan result files are now available as compressed files. If they are bigger then 2Gigabytes then we do chunk them before compression. But I think for OSD that is never the case. The URLs for the InterProScan result files changed. To request the number of chunks you call: https://www.ebi.ac.uk/metagenomics/projects/ERP009703/samples/ERS667660/runs/ERR770988/results/versions/2.0/function/InterProScan/chunks
Then you have to iterate over the number of chunks: https://www.ebi.ac.uk/metagenomics/projects/ERP009703/samples/ERS667660/runs/ERR770988/results/versions/2.0/function/InterProScan/chunks/{1...n}
We hope to get the FASTA formatted files chunked and compressed as well in the near future.
Best, Maxim
All larger result files are now available as compressed files (gzip): https://www.ebi.ac.uk/metagenomics/projects/ERP009703
Hi Maxim,
Are there new urls? Do I need to change the attached script?
thank you, -Kelly
Kelly D. Goodwin, Ph.D. National Oceanic and Atmospheric Administration AOML & SWFSC
8901 La Jolla Shores Drive La Jolla, CA 92037 858 546 7142 FAX: 858 546-7003 http://www.aoml.noaa.gov/ocd/people/goodwin/
On Mon, Oct 26, 2015 at 7:04 AM, Maxim notifications@github.com wrote:
All larger result files are now available as compressed files (gzip): https://www.ebi.ac.uk/metagenomics/projects/ERP009703
— Reply to this email directly or view it on GitHub https://github.com/MicroB3-IS/osd-analysis/issues/15#issuecomment-151143531 .
Hi Kelly,
Which script do you refer to? The one from Eric? Just looked into Eric's Excel sheet. The URLs for the sequence section changed since summer.
As the OSD result files are relatively small, we kept them unchunked. The URLs need changing to: https://www.ebi.ac.uk/metagenomics/projects/ERP009703/samples/ERS667478/runs/ERR771028/results/versions/2.0/sequences/ProcessedReads/chunks/1
The template URL is: https://www.ebi.ac.uk/metagenomics/projects/{project_id}/samples/{sample_id}/runs/{run_id}/results/versions/{version_number}/{domain}/{result_file_type}/chunks/1
Here is a list of supported domains and result file types so far: Values for the different domains are:
Domain | Result file type |
---|---|
sequences | ProcessedReads |
ReadsWithPredictedCDS | |
ReadsWithMatches | |
ReadsWithoutMatches | |
PredictedCDS | |
PredictedORFWithoutAnnotation | |
PredicatedCDSWithoutAnnotation | |
--------------- | ----------------- |
function | InterProScan |
I have quickly put a Python script together to support project bulk download for individual unchunked result files types. I believe all of the OSD result files are unchunked. Documentation, including a link to the script could be find here: https://github.com/ProteinsWebTeam/ebi-metagenomics/wiki/Downloading-results-programmatically
The script won't work for the taxonomy section. That part needs to be integrated. Of course the script needs further improvement.
Any feedback will be appreciated. I am happy to answer more questions if needed. Best, Maxim
thank you Maxim. could you please supple us with the input file (the mapping file) to ensure that the script runs without error?
thank you, -kelly
Kelly D. Goodwin, Ph.D. National Oceanic and Atmospheric Administration AOML & SWFSC
8901 La Jolla Shores Drive La Jolla, CA 92037 858 546 7142 FAX: 858 546-7003 http://www.aoml.noaa.gov/ocd/people/goodwin/
On Fri, Oct 30, 2015 at 3:30 AM, Maxim notifications@github.com wrote:
I have quickly put a Python script together to support project bulk download for individual unchunked result files types. I believe all of the OSD result files are unchunked. Documentation, including a link to the script could be find here:
https://github.com/ProteinsWebTeam/ebi-metagenomics/wiki/Downloading-results-programmatically
The script won't work for the taxonomy section. That part needs to be integrated. Of course the script needs generally improved as well.
Any feedback will be appreciated. I am happy to answer more questions we needed. Best, Maxim
— Reply to this email directly or view it on GitHub https://github.com/MicroB3-IS/osd-analysis/issues/15#issuecomment-152487313 .
Sure. Please download the input file here: https://github.com/mscheremetjew/osd-analysis/blob/3a1c21c920b5522d108cc6c9c7f0445eff81b464/input_osd_bulk_download.tsv
For downloading all the InterProScan result files into 1 directory your Python call need to look something like this:
python mgportal_bulk_download.py -i input_osd_bulk_download.tsv -o ~/blah -v 2.0 -t ProcessedReads
Documentation has been updated and the script improved: https://github.com/ProteinsWebTeam/ebi-metagenomics/wiki/Downloading-results-programmatically
Best, Maxim
I couldn't find a way to batch download the processed files so I made my own excel spreadsheet to generate the proper URLs and wget commands. If someone has a better way (i.e. an EBI API) please share it. Otherwise I hope you might find this useful. Change the yellow cell to the files you would like to download and it will automatically populate the blue cells with the URLs and wget calls.
on github: https://github.com/rec3141/OSD/blob/master/osd-ebi-samples.xlsx
cheers, Eric Collins