cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
283 stars 49 forks source link

Indexing Error for VariantTable, requires values to be monotonically increasing #384

Open npb596 opened 2 years ago

npb596 commented 2 years ago

Hello,

I have been receiving the below error:

    vcf_first_vt = allel.VariantTable({'CHROM' : vcf_first['variants/CHROM'], 'POS' : vcf_first['variants/POS'], 'REF' : vcf_first['variants/REF'], 'ALT' : vcf_first['variants/ALT'][:,0], 'GT' : vcf_first['calldata/GT'][:,0,0], 'PS' : vcf_first['calldata/PS'][:,0]}, index=('CHROM','POS'))
  File "/home/nbailey/anaconda3/lib/python3.9/site-packages/allel/model/ndarray.py", line 4517, in __init__
    self.set_index(index)
  File "/home/nbailey/anaconda3/lib/python3.9/site-packages/allel/model/ndarray.py", line 4542, in set_index
    index = SortedMultiIndex(self[index[0]], self[index[1]],
  File "/home/nbailey/anaconda3/lib/python3.9/site-packages/allel/model/ndarray.py", line 4036, in __init__
    l1 = SortedIndex(l1, copy=copy)
  File "/home/nbailey/anaconda3/lib/python3.9/site-packages/allel/model/ndarray.py", line 3384, in __init__
    raise ValueError('values must be monotonically increasing')
ValueError: values must be monotonically increasing

For some clarity, my python script has the vcf_first_vt definition given above, and this causes the subsequent errors. It seems I can avoid this error so long as I use lexicographic sorting of numbers (e.g. chr1, chr10, chr2 instead of chr1, chr2, chr10) and remove chromosome names without numbers (e.g. chrX and chrY). This is odd to me as I assume something like "chr1" should be treated as a string (as per the example here: https://scikit-allel.readthedocs.io/en/stable/model/ndarray.html?highlight=sortedmultiindex#sortedmultiindex). I suppose the lexicographic sorting makes sense when the numbers are treated as strings, though I don't understand why they necessarily need to be sorted in any particular order at all. Is there a way of defining a VariantTable that I'm missing that would allow chromosomes to be sorted in any particular order? If not, would it be possible to make this kind of issue more explicit?