PoonLab / sierra-local

Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
GNU General Public License v3.0
6 stars 3 forks source link

list index out of range for hivdb versions 9.3 and 9.4 #70

Closed Kanyerezi30 closed 1 year ago

Kanyerezi30 commented 1 year ago

I have download the xml files for hivdb versions 9.3 and 9.4 from stanford website but when i run the command below

sierralocal -xml HIVDB_9.4.xml RT.fa -o RT-9.4.json

I get the error below

HIVdb version 9.4 Traceback (most recent call last): File "/usr/local/bin/sierralocal", line 11, in <module> exit_code = main.main() File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/main.py", line 179, in main cleanup=args.cleanup, forceupdate=args.forceupdate) File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/main.py", line 108, in sierralocal writer = JSONWriter(algorithm) File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/jsonwriter.py", line 22, in __init__ self.database = self.algorithm.parse_drugs(self.algorithm.root) File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/hivdb.py", line 212, in parse_drugs cond_dict = self.parse_condition(condition) # dictionary of parsed drug conditions File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/hivdb.py", line 252, in parse_condition self._parse_scores(self.drms, drm, drm, iter) File "/home/kanye/.local/lib/python3.6/site-packages/sierralocal/hivdb.py", line 279, in _parse_scores drm_lib.append({'group': mut_list, 'value': int(scores[iter])}) IndexError: list index out of range It's the same for version 9.3

ArtPoon commented 1 year ago

just got back from the holiday break, will investigate

GopiGugan commented 1 year ago

Issue is with the following line: https://github.com/PoonLab/sierra-local/blob/18ae6aaa64fd1e6dae5ff40b7084790ff5cf3a64/sierralocal/hivdb.py#L267

For the string: 67EGNHST AND 70R AND 184VI AND 219ENQRW => 10, it wouldn't extract 10.

This can be fixed by looking for the following pattern:

scores = re.findall('([-]?[0-9]+(?=\W)|[-]?[0-9]+(?=$))', drm.strip())

ArtPoon commented 1 year ago

Thanks @GopiGugan - I'll make a dev branch so we can add this change and test.

Kanyerezi30 commented 1 year ago

I have effected it in my hivdb.py and it works well now