cokelaer / bioservices

Access to Biological Web Services from Python.
http://bioservices.readthedocs.io
Other
278 stars 60 forks source link

Retrieving error from Psicquic request #248

Closed lionel-spinelli closed 1 year ago

lionel-spinelli commented 1 year ago

Hello, I am using the very convenient Psicquic wrapper you developed to query interactions from databases. For request asking for small quantities it works perfectly. However, when I try to query more data (more than 1000 interaction in a single query or several frequents queries of 100 interactions), the message "WARNING [bioservices.PSICQUIC:596]: status is not ok with Server Error". I found no information of what it means. I suppose the Psicquici servers are rejecting my request for some reason but I did not find a way to get more information from the Bioservices Psicquic object.

Could you tell me how to get more information about this problem, so I can adapt my code and if I can catch this error programmatically (like try except) to make my code responsive to this problem?

Thanks in advance Bert

cokelaer commented 1 year ago

@lionel-spinelli thanks for using bioservices status is not ok with Server Error is indeed an error on psiciquic server.

Do you have a workable snippet to provide so I can test myself ? could be that the length of the final query is too long. There are limitations indeed regarding the length of the URL but this is a general issue. I have not update this service for a long time. Thera are maybe some new interface based on the POST request (rather than GET) that I can check out for you. If you provide an example this will help me to start with this issue and give more feedback. best

lionel-spinelli commented 1 year ago

Hello @cokelaer ,

thanks a lot for your response. Here is an extraction/modification of my code as example (you may have to simplify it for testing). Let me know if it is clear enough :


self.psicquicDatabase = "InnateDB"
self.speciesName = "Homo_Sapiens"
self.speciesCodeDict ={}
self.speciesCodeDict[ self.speciesName ] = "9606"
self.DOWNLOAD_BATCH_SIZE = 100
database_interactions_count = (the number of interaction in the queried database retrieved previously through the PSICQUIC service)

# Divide the quantity of interactions in different batches
batch_number = ( database_interactions_count // self.DOWNLOAD_BATCH_SIZE) + 1

# Loop over query batches to get all interactions since a single query is not authorized
for batch_id in range( batch_number ):

    # get the list of interaction in the current batch
    interaction_list = psicquic_service.query( service=self.psicquicDatabase.lower(),
                                             query="species:" + str( self.speciesCodeDict[ self.speciesName ]),
                                             output="xml25",
                                             version="current",
                                             firstResult=1 + batch_id * self.DOWNLOAD_BATCH_SIZE,
                                             maxResults=min( (batch_id+1) * self.DOWNLOAD_BATCH_SIZE, database_interactions_count)
                                                       )

     # save the list of the interactions to XML file
     print( "|-- writing XML file...")
     with open( os.path.join( self.output_path, self.speciesName + "_" + self.psicquicDatabase + "_batch" + str( batch_id) + ".xml" ),
                 "wb" ) as output_file:
         pickle.dump( interaction_list, output_file )

     # wait before new request to avoid black listing from psicquic servers
      print( "|-- waiting before new request..." )
      time.sleep( 3)
cokelaer commented 1 year ago

I see that you are using xml25 as output. I tried a simple example on another database nad got a 500 error. Using the default output set to tab25 worked. This example retrieves 100 results but fails:

p.query("intact", "species:9606", output="xml25",firstResult=1000, maxResults=1100)

while this one works:

p.query("intact", "species:9606", output="xml25",firstResult=1000, maxResults=1100)

extended to 1000 queries, it works as well:

p.query("intact", "species:9606", output="tab25",firstResult=1000, maxResults=1100)

Not sure why but looks like an issue on the server side. Using tab25 is easier to parse but you may miss some information.

lionel-spinelli commented 1 year ago

Hello,

thanks for your test. You are perfectly right : the tab25 output works while the xml25 does not. I am even able to query the whole databse at once. I would never have though it could be a format issue... Unfortunatly, the XML structure is largely better to parse because it is highly structured... I will go and check on the Psicquic side if they have some clue about this output format issue.

Thanks again for your test and for finding me the reason of theses issues.. Lionel