HUPO-PSI / proxi-schemas

ProXI: Schema definitions for the Proteomics eXpression Interface
3 stars 3 forks source link

Error status for multiple providers #72

Open ypriverol opened 3 years ago

ypriverol commented 3 years ago

As we discussed last week, we will need to have a different definition of errors or status when querying all entry points. The broker will need to retrieve multiple statuses for multiple entry points. We have multiple options here:

 [ 
     {
         "peptideSequence": "LSSPATLNSR",
         "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
     },
     {
        "peptideSequence": "APLVCLPVFVSR",
        "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
     },
 ]
{
 errors: []
}
{ 
   data: [
     {
         "peptideSequence": "LSSPATLNSR",
         "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2"
     },
     {
        "peptideSequence": "APLVCLPVFVSR",
        "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2"
     },
   ], 
   errors: []
}

The second approach define a global object with two parts data and errors.

edeutsch commented 3 years ago

After pondering this for a while and playing with various options, here's what I suggest. We can support two modes, a synchronous mode where the final answer is transmitted as one object when everything is ready. This is the traditional way of doing things. For your example query above, it might look like this:

{ 
    "query": "/psms?resultType=compact&accession=PXD005942"
    "responses": [
        {
            "source": "PeptideAtlas",
            "code": 200
            "elapsed_seconds": 5.2322
            "psms": [
                {
                    "peptideSequence": "LSSPATLNSR",
                    "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2",
                },
                {
                    "peptideSequence": "APLVCLPVFVSR",
                    "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2",
                }
            ]
        },
        {
            "source": "PRIDE":
            "code": 200,
            "elapsed_seconds": 1.3243
            "psms": [
                {
                    "peptideSequence": "LSSPATLNSR",
                    "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2",
                }
            ]
        },
        {
            "source": "MassIVE"
            "code": 404,
            "elapsed_seconds": 0.2342
            "message": "PSMs for PXD005942 not available"
        },
        {
            "source": "jPOST"
            "code": 0,
            "message": "Timed out after 30 seconds"
            "elapsed_seconds": 30.0000
        }
    ]
}

This would not arrive until 30 seconds (the timeout) has elapsed but the returned result is a single JSON object.

The second, fancy way is to stream objects as they become available with a special final object to signal that the stream is done. this might be:

{
    "source": "MassIVE"
    "code": 404,
    "elapsed_seconds": 0.2342
    "message": "PSMs for PXD005942 not available"
}
{
    "source": "PRIDE"
    "code": 200,
    "elapsed_seconds": 1.3243
    "psms": [
        {
            "peptideSequence": "LSSPATLNSR",
            "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2",
        }
    ]
}
{ 
    "source": "PeptideAtlas",
    "code": 200,
    "elapsed_seconds": 5.2322
    "psms": [
        {
            "peptideSequence": "LSSPATLNSR",
            "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:10:LSSPATLNSR/2",
        },
        {
            "peptideSequence": "APLVCLPVFVSR",
            "usi": "mzspec:PXD005942:030219_ywt_sf-39:scan:120:APLVC[Carbamidomethyl]LPVFVSR/2",
        }
    ]
}
{
    "responses": {
        "PeptideAtlas": {
            "code": 200,
            "elapsed_seconds": 5.2322
        },
        "PRIDE": {
            "code": 200,
            "elapsed_seconds": 1.3243
        }
        "MassIVE": {
            "code": 404,
            "message": "PSMs for PXD005942 not available"
            "elapsed_seconds": 0.2342
        }
        "jPOST": {
            "code": 0,
            "message": "Timed out after 30 seconds"
            "elapsed_seconds": 30.0000
        }
    }
}

The objects are transmitted as they become available. Unlike the way they are depicted above, objects do not contain newlines, but are separated by newlines. If an object contains "source" then it is a new transmitted result. If the object contains "responses" then the client knows that that is the final object and it can stop reading from the stream. This allows the first results to be available in seconds with late results trickling in later. This latter model is what we currently use for this page: http://proteomecentral.proteomexchange.org/PROXI.php to pleasing effect. You can see the raw stream that the client uses here: http://proteomecentral.proteomexchange.org/cgi/PROXI_status

The client chooses which model they want to use: complete response or streamed objects. either way, the responses from each source are the same and can be accessed with the same code. The content of "psms" is what we current return with /psms

The only snag is that this model vastly different than the current /psms. I wanted them to be the same.

To make them compatible, we'd need to think about this some more, but I'm out of time for now..