HUPO-PSI / proxi-schemas

ProXI: Schema definitions for the Proteomics eXpression Interface
3 stars 3 forks source link

How should a spectrum status be used? #76

Open edeutsch opened 3 years ago

edeutsch commented 3 years ago

The current Spectrum class defines a required attribute:

      status:
        type: string 
        enum: [READABLE, PEAK UNAVAILABLE]
        description: Status of the Spectrum

Can we define these status entries?

What does READABLE mean? Does this mean that the spectrum exists can be fetched and provided? I suppose this is fine, although a strange word, since the antonym is UNREADABLE. But what would UNREADABLE mean? And that isn't an option.

What does "PEAK UNAVAILABLE" mean exactly? Is that the first peak unavailable? or any one peak unavailable? All peaks unavailable? Some peaks unavailable? Or does it mean the spectrum is unavailable? How is this different from a 404?

How should this be used? At PeptideAtlas a spectrum is either available and provided or it is not available and just not in the returned list or is a 404. PeptideAtlas doesn't use "PEAK UNAVAILABLE" since I don't know what it should mean or how it should be used.

Should it be used if there is no such spectrum at the repository? Should it be used if the spectrum is real and valid and should be available, but due to some technical glitch it cannot be fetched from the data store? So not 404. But closer to 500?

We should decide and document this.

jjcarver commented 3 years ago

My interpretation is that any record returned in the query corresponds to at minimum a file that is present on disk. So a status of "PEAK UNAVAILABLE" does not mean the same thing as a 404. A 404 means the MS run (i.e. file) isn't there at all. Any record returned, regardless of status, is by definition not a 404.

In the case of peak list files in open format (e.g. mzML) we can easily read the file to verify that the requested spectrum is indeed present and extract its peaks. This is what I interpret a status of "READABLE" to mean. The file is there AND we can validate/extract the actual spectrum from it.

However, sometimes a spectrum query/USI matches a raw file, which we can technically "read" in a file system sense but which we cannot (at least easily) open up to extract the actual spectrum. This is what "PEAK UNAVAILABLE" means. The MS run you asked for is there, we assume the specific spectrum you asked for may be in that file, but we can't actually give you its peaks.

I am honestly not sure if this is the best interpretation. But this is how I read the current specification.

ypriverol commented 3 years ago

the PEAK UNAVAILABLE was instroduced by @Nuno because is something needed in MassIVE when the usi is there but they can't read the mzML because of RAW file conversion problem or other types of issues.