CCMS-UCSD / MassIVEDocumentation

Documentation for MassIVE workflows and Website
https://CCMS-UCSD.github.io/MassIVEDocumentation/
MIT License
3 stars 6 forks source link

Broken paths to `ccms_peak` files #30

Closed wfondrie closed 4 months ago

wfondrie commented 4 months ago

I've been trying to download some of the mzML files used to build MassIVE-KB, but it seems there seems to be a consistent problem for files in the ccms_peak directories. For example, here I try to download an mzML file from MSV000088407:

$ wget ftp://massive.ucsd.edu/x01/MSV000088407/ccms_peak/RAW/AD02_BA46_INSOLUBLE_01.mzML
--2024-04-24 10:01:14--  ftp://massive.ucsd.edu/x01/MSV000088407/ccms_peak/RAW/AD02_BA46_INSOLUBLE_01.mzML
           => ‘AD02_BA46_INSOLUBLE_01.mzML’
Resolving massive.ucsd.edu (massive.ucsd.edu)... 132.249.211.16
Connecting to massive.ucsd.edu (massive.ucsd.edu)|132.249.211.16|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /x01/MSV000088407/ccms_peak/RAW ...
No such directory ‘x01/MSV000088407/ccms_peak/RAW’.

If instead I try a different file outside of the ccms_peak directory, it works without issue:

$ wget ftp://massive.ucsd.edu/x01/MSV000088407/ccms_parameters/params.xml
--2024-04-24 10:07:03--  ftp://massive.ucsd.edu/x01/MSV000088407/ccms_parameters/params.xml
           => ‘params.xml.1’
Resolving massive.ucsd.edu (massive.ucsd.edu)... 132.249.211.16
Connecting to massive.ucsd.edu (massive.ucsd.edu)|132.249.211.16|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /x01/MSV000088407/ccms_parameters ... done.
==> SIZE params.xml ... 16688
==> PASV ... done.    ==> RETR params.xml ... done.
Length: 16688 (16K) (unauthoritative)

params.xml.1           100%[===========================>]  16.30K  --.-KB/s    in 0.04s

2024-04-24 10:07:03 (374 KB/s) - ‘params.xml.1’ saved [16688]

Please note that the mzML file I'm trying to download is also shown on the webpage, but must be downloaded over FTP: image

Thanks for your help!

jjcarver commented 4 months ago

Hi Will,

Thanks for asking about this. We recently restructured the MassIVE repository, and in doing so put the ccms_peak files into a separate storage location. We still try to keep all the files for a dataset together for the purposes of web interface display and back end computation, but we now do this through the use of symbolic links (which are like pointers or Windows shortcuts, if you're not familiar). The problem is that FTP will not follow symbolic links. So unfortunately you can no longer download these files directly from the main dataset directory.

To download the ccms_peak files, you must go to their separate physical location on the FTP server. This is now stored under volume "z01". For the example dataset file you mentioned (MSV000088407/ccms_peak/RAW/AD02_BA46_INSOLUBLE_01.mzML), the correct FTP URL would be:

ftp://massive.ucsd.edu/z01/MSV000088407/ccms_peak/RAW/AD02_BA46_INSOLUBLE_01.mzML

(i.e. simply replace "x01" with "z01").

And yes, we should definitely document this better now that the FTP access pattern has changed.

wfondrie commented 4 months ago

Thanks @jjcarver,

Are the files for every dataset on volume "z01", or does the drive parallel that of the dataset? For example, if a dataset is now at "x07", would the ccms_peak files be on "z07"?

It's great to have a source for mzML files that have generated in a homogeneous manner from a large number of datasets!

jjcarver commented 4 months ago

@wfondrie That's a good question. Currently there is no mirrored volume naming structure. Volume z01 currently stores all ccms_peak files period, regardless of which volume their associated dataset is stored under.

wfondrie commented 4 months ago

Perfect. Thanks!