PoonLab / sierra-local

Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
GNU General Public License v3.0
6 stars 4 forks source link

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType' #92

Closed aguang closed 1 year ago

aguang commented 1 year ago

I am trying to implement sierralocal for when we have issues with the sierrapy client. I'm running into this error on some of my sequences however:

(/users/aguang/anaconda/sierralocal-clean) [aguang@node1318 D54]$ sierralocal xx85 -o xx85_results.json
searching path /users/aguang/.local/lib/python3.10/site-packages/sierralocal/data/HIVDB*.xml
searching path /users/aguang/.local/lib/python3.10/site-packages/sierralocal/data/apobec_drms.json
HIVdb version 9.4
Aligning using post-align
Aligned xx85
Traceback (most recent call last):
  File "/users/aguang/anaconda/sierralocal-clean/bin/sierralocal", line 11, in <module>
    exit_code = main.main()
  File "/users/aguang/.local/lib/python3.10/site-packages/sierralocal/main.py", line 196, in main
    count, time_elapsed = sierralocal(args.fasta, args.outfile, xml=args.xml,
  File "/users/aguang/.local/lib/python3.10/site-packages/sierralocal/main.py", line 133, in sierralocal
    sequence_lengths, file_trims, subtypes, na_sequence = scorefile(input_file, algorithm,
  File "/users/aguang/.local/lib/python3.10/site-packages/sierralocal/main.py", line 79, in scorefile
    length_lists.append(last_na - first_na + 1)
TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'

This is one of the sequences:

>S1
ACTCTKTGGCAACGACCCATTGTTACAATAAAGATAGGGGGGCAACTAAAGGARGCTCTATTAGATACAGGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAAATGGAAACCAAGAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGATACTTATAGAAATTTGTGGACATAGAGCTATAGGTACAGTATTAATAGGGCCTACACCTGTCAACATAATTGGAAGAAACCTGTTGACTCAGATTGGTTGCACCTTAAATCTTTGTACAGAAATGGAAAAGGAAGGRAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAGGACAGTACTAAATGGAGAAAATTGGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTTCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAARAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCAGTYCCCTTAGATAAAGACTTCAGGAAGTACACTGCWTTTACYATACCTAGTATAAACAATGAGACACCAGGGACTRGATATCAGTACAATGTGCTTCCACAGGGATGGAAAGGGTCACCAGCAATATTCCAAAGTAGYATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGGATCTGACTTAGAAATAGGGCAGCATAGAACAAAAGTAGAGGAGCTGAGACAACATCTGTTGGGGTGGGGATTTACCACACCAGACAAGAAGCATCAGAAAGAACCCCCATTCCTWTGGATGGGTTAYGAACTCCATCCTGATAAATGGACAGTACRGCCTATAGTGCTGCCA

I have some other examples as well that I can post here if you'd like, thanks.

WilliamZekaiWang commented 1 year ago

Hi, I have replicated this issue with the sequence you provided and fixed the issue for this case in the dev branch.

When post-align runs and encounters an NA triplet of length 0, it gives None as the NA position. I made it ignore those alignments if the length is 0, but this issue may still occur if the length is 1 or 2 instead of 3. I haven't tested it on those yet.

It seems this issue does not occur when using the old alignment nuc-amino.

ArtPoon commented 1 year ago

Let's move ahead with a PR for this update - we'll deal with the other potential edge cases (1 or 2) later if it arises. Thanks @WilliamZekaiWang