genouest / biomaj

BioMAJ
http://genouest.github.io/biomaj/
GNU Affero General Public License v3.0
12 stars 10 forks source link

Error thrown when release previously deleted and restarted #82

Open horkko opened 7 years ago

horkko commented 7 years ago

Hi,

I'm facing a strange behavior from BioMAJ, the error below is thrown when I restart a workflow after it failed once. Here is the context. I create a new bank with its associated configuration file. I start biomaj to update this bank, all goes ok BioMAJ exit with status code 0 and STATUS[True]. I then realized that the regexp used to catch remote.files is wrong and download to many files. I decide to delete previously built release with --remove --bank foobar --release xxx. The last release it well deleted. I update my configuration file to update the remote.files regexp. I start a new update, it fails as the the regexp is wrong and did not catch any remote file(s). I update the regexp (which is now correct), and now, BioMAJ does not want to complete and throw the following error:

...
2017-04-05 10:45:26,878 INFO  [root][MainThread] [workflow.py:wf_download:1183] Workflow:wf_release:same_as_previous_session
2017-04-05 10:45:26,879 ERROR [root][MainThread] [workflow.py:start:132] Workflow:download:Exception:'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
 File "/biomaj3/lib/python2.7/site-packages/biomaj/workflow.py", line 129, in start
   self.session._session['status'][flow['name']] = getattr(self, 'wf_' + flow['name'])()
 File "/biomaj3/lib/python2.7/site-packages/biomaj/workflow.py", line 1184, in wf_download
   return self.no_need_to_update()
 File "/biomaj3/lib/python2.7/site-packages/biomaj/workflow.py", line 734, in no_need_to_update
   self.session.set('release', last_session['release'])
TypeError: 'NoneType' object has no attribute '__getitem__'
2017-04-05 10:45:26,883 ERROR [root][MainThread] [workflow.py:start:142] Error during task download
2017-04-05 10:45:26,884 INFO  [root][MainThread] [workflow.py:wf_over:250] Workflow:wf_over
2017-04-05 10:45:26,903 INFO  [root][MainThread] [notify.py:notifyBankAction:29] Notify:admin@biomaj.org
2017-04-05 10:45:26,907 INFO  [root][MainThread] [notify.py:notifyBankAction:45] BANK[foobar] - STATUS[True] - UPDATE[False] - REMOVE[False] - RELEASE[2017-03-08]

Can you have a look why it failed? For info, after removing all with --remove-all I've been able to start and complet my bank update.

Emmanuel

Chriou commented 7 years ago

Hi Emmanuel, I am currently working on biomaj with osallou. Thanks for the reporting. Unfortunately I am not able to reproduce this error every time (which is complicated to correct it). I would like to know if you were able to reproduce this error? And what was the status of the previous release of foobar (if there was one)? Published ? Do you have any precision? Thanks,

Chloé

horkko commented 7 years ago

Hi Chloe,

I've been able to reproduce the error. I've made some request into Mongo to try to illustrate what happend.

With remote.files=^alu\.n\..*$     
$ biomaj-cli.py --update --bank dbtest => OK                                                                                                       
Mongo:
---------
"sessions": [                                                                                                                         
    {                                                                                                                                 
      "id": 1494926812.276509,                                                                                                        
      "release": "2009-06-15"                                                                                                         
    }                                                                                                                                 
  ]                                                                                                                                   

I remove last session (which was OK, but too many files)

biomaj-cly --remove --bank dbtest --release 2009-06-15  => OK                                                                       
Mongo:
---------
"sessions": [                                                                                                                       
    {                                                                                                                                 
      "id": 1494926812.276509,                                                                                                        
      "release": "2009-06-15"                                                                                                         
    },                                                                                                                                
    {                                                                                                                                 
      "id": 1494926894.63619,                                                                                                         
      "release": "2009-06-15"                                                                                                         
    }                                                                                                                                 
  ]                                                                                                                                   

I update the regexp (a wrong one, I've made a mistake):

With remote.files=^alu\.nb\..*$
$ biomaj-cli.py --update --bank dbtest => raise Exception('no file found matching expressions') 
Mongo:
---------                                            
  "sessions": [                                                                                                                       
    {                                                                                                                                 
      "id": 1494926812.276509,                                                                                                        
      "release": "2009-06-15"                                                                                                         
    },                                                                                                                                
    {                                                                                                                                 
      "id": 1494926894.63619,                                                                                                         
      "release": "2009-06-15"                                                                                                         
    },                                                                                                                                
    {                                                                                                                                 
      "id": 1494926954.698821,                                                                                                        
      "release": null                                                                                                                 
    }                                                                                                                                 
  ]

I correct the regexp which is good the one.

With remote.files=^alu\.n\..*$ 
$ biomaj-cli.py --update --bank dbtest =>  self.session.set('release', last_session['release'])                                                
                                                                   TypeError: 'NoneType' object has no attribute '__getitem__'         
Mongo:
---------                                    
  "sessions": [                                                                                                                       
    {                                                                                                                                 
      "id": 1494926812.276509,                                                                                                        
      "release": "2009-06-15"                                                                                                         
    },                                                                                                                                
    {                                                                                                                                 
      "id": 1494926894.63619,                                                                                                         
      "release": "2009-06-15"                                                                                                         
    },                                                                                                                                
    {                                                                                                                                 
      "id": 1494926954.698821,                                                                                                        
      "release": "2009-06-15"                                                                                                         
    }                                                                                                                                 
  ]                                        

Here are the remote server info I've used to reproduce the bug:

protocol=ftp
server=ftp.ncbi.nlm.nih.gov
remote.dir=/blast/db/FASTA/
remote.files=^alu\.n\..*$

Hope this will help. Let me know if you can reproduce it at home. Thanks Emmanuel

Chriou commented 7 years ago

Hi Emmanuel,

Thanks a lot for your answer. I did exactly what you said with the same parameters but it worked every time. At home, it is really a random bug. I tried with a bank with two versions. I deleted the last version with the option --remove --bank foobar --release XXX, and I tried to download it again and sometimes I get this bug :

> 14:11:35,081 INFO  [root][MainThread] Workflow:wf_download:release:release:2003-11-26
> 2017-05-15 14:11:35,084 INFO  [root][MainThread] ####DEBUG wf download self.session.previous_release: 2003-11-26
> 2017-05-15 14:11:35,085 INFO  [root][MainThread] ####DEBUG  wf download self.session.get(remoterelease) : 2003-11-26
> 2017-05-15 14:11:35,086 INFO  [root][MainThread] ####DEBUG  wf download self.is_previous_release_content_identical() : True
> 2017-05-15 14:11:35,087 INFO  [root][MainThread] ####DEBUG  wf download get status ? : {'over': False, 'depends': True, 'publish': False, 'release': True, 'init': True, 'postprocess': False, 'preprocess': True, 'download': False, 'check': True}
> 2017-05-15 14:11:35,087 INFO  [root][MainThread] Workflow:wf_release:same_as_previous_session
> 2017-05-15 14:11:35,090 ERROR [root][MainThread] ###DEBUG no_need_to_update release : 2003-11-26
> 2017-05-15 14:11:35,091 ERROR [root][MainThread] ###DEBUG no_need_to_update remoterelease : 2003-11-26
> 2017-05-15 14:11:35,092 ERROR [root][MainThread] ###DEBUG no_need_to_update previous_release : None
> 2017-05-15 14:11:35,093 ERROR [root][MainThread] ###DEBUG no_need_to_update last session : None
> 2017-05-15 14:11:35,093 ERROR [root][MainThread] Workflow:download:Exception:'NoneType' object has no attribute '__getitem__'
> Traceback (most recent call last):
>   File "/Biomaj3.1.0/biomaj/biomaj/workflow.py", line 129, in start
>     self.session._session['status'][flow['name']] = getattr(self, 'wf_' + flow['name'])()
>   File "/Biomaj3.1.0/biomaj/biomaj/workflow.py", line 1192, in wf_download
>     return self.no_need_to_update()
>   File "/Biomaj3.1.0/biomaj/biomaj/workflow.py", line 739, in no_need_to_update
>     self.session.set('release', last_session['release'])
> TypeError: 'NoneType' object has no attribute '__getitem__'
> 2017-05-15 14:11:35,107 ERROR [root][MainThread] Error during task download
> 2017-05-15 14:11:35,108 INFO  [root][MainThread] Workflow:wf_over
> 2017-05-15 14:11:35,117 INFO  [root][MainThread] Notify:none
> An error occured:
> 
> Bank update request sent for alu
> Failed to send update request for alu

I really do not understand why I can not reproduce it again, for further testing. I will try again.

Chloe