genouest / biomaj-download

Download microservice for BioMAJ
GNU Affero General Public License v3.0
1 stars 7 forks source link

mis-interpretation of date format #41

Closed nagoue closed 2 weeks ago

nagoue commented 2 years ago

Hi @osallou, I encountered this issue trying downloading from this site : https://ftp.uniprot.org/pub/databases/uniprot/current_release/uniparc/ Log is:

Traceback (most recent call last):
 45   File "xxx/biomaj/venv/lib/python2.7/site-packages/biomaj/workflow.py", line 130, in start
 46     self.session._session['status'][flow['name']] = getattr(self, 'wf_' + flow['name'])()
 47   File "xxx/biomaj/venv/lib/python2.7/site-packages/biomaj/workflow.py", line 1204, in wf_download
 48     release_dict = Utils.get_more_recent_file(downloader.files_to_download)
 49   File "xxx/biomaj/venv/lib/python2.7/site-packages/biomaj_core/utils.py", line 161, in get_more_recent_file
 50     rel_date = datetime.date(int(release['year']), int(release['month']), int(release['day']))
 51 ValueError: day is out of range for month
 52 2022-01-28 08:16:12,392 DEBUG [root][MainThread] Traceback (most recent call last):
 53   File "xxx/biomaj/venv/lib/python2.7/site-packages/biomaj/workflow.py", line 130, in start
 54     self.session._session['status'][flow['name']] = getattr(self, 'wf_' + flow['name'])()

Meaning that year and day are mixed up. Then, I modified this script : biomaj/venv/lib/python2.7/site-packages/biomaj_download/download/http.py this way to make it works:

                    if self.http_parse.file_date_format:
                        date_object = datetime.datetime.strptime(date, self.http_parse.file_date_format.replace('%%', '%'))
                        rfile['month'] = date_object.month
                        rfile['day'] = date_object.day
                        rfile['year'] = date_object.year
                    else:
                        dirdate = date.split()
                        parts = dirdate[0].split('-')
                        # 19-Jul-2014 13:02
                        rfile['month'] = Utils.month_to_num(parts[1])
                        if int(parts[0]) < 32 :
                            rfile['day'] = int(parts[0])
                            rfile['year'] = int(parts[2])
                        else : 
                            rfile['day'] = int(parts[2])
                            rfile['year'] = int(parts[0])

It's definitely not the best way, but it solved the issue for my week. I am not really sure to see how this part of script is supposed to work as I have impression self.http_parse.file_date_format is always set to none. Cheers,

osallou commented 2 years ago

there are some defaults to try to interpret the date , but it can be overriden in properties file, for example:

http.group.file.date_format=%%Y-%%m-%%d %%H:%%M

unfortunatly dates are presented in very different formats

nagoue commented 2 years ago

Thanks for the tip. I'll try that. It's cleaner !

De: "Olivier Sallou" @.> À: "genouest/biomaj-download" @.> Cc: "Nadia GOUE" @.>, "Author" @.> Envoyé: Vendredi 28 Janvier 2022 09:30:24 Objet: Re: [genouest/biomaj-download] mis-interpretation of date format (Issue #41)

there are some defaults to try to interpret the date , but it can be overriden in properties file, for example: http.group.file.date_format=%%Y-%%m-%%d %%H:%%M

unfortunatly dates are presented in very different formats

— Reply to this email directly, [ https://github.com/genouest/biomaj-download/issues/41#issuecomment-1023992185 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ACE4IHQCY35Z7P2GUEFV7R3UYJH2BANCNFSM5NAB6BAQ | unsubscribe ] . Triage notifications on the go with GitHub Mobile for [ https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 | iOS ] or [ https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub | Android ] . You are receiving this because you authored the thread. Message ID: <genouest/biomaj-download/issues/41/1023992185 @ github . com>