Gabaldonlab / jloh

A tool to extract LOH blocks from VCF, BAM and FASTA data
http://jloh.readthedocs.io
GNU General Public License v3.0
21 stars 2 forks source link

Error running the jloh stats command #5

Closed Rahmmy closed 1 year ago

Rahmmy commented 1 year ago

Hello @MatteoSchiavinato ,

Following a successfull installation of the tool, I was able to do a test run with jloh sim --fasta S_para.chrXII.fa which had no errors.

However with the stats option it failed. Please see the code and error below. Please advise. Kind regards,

(base) abdurahman@Abdul-Rahmans-MacBook-Pro jloh % ./jloh stats --vcf test_data/out.ff.vcf      
[Fri Oct 13 12:52:22 2023] Reading SNPs
[Fri Oct 13 12:52:22 2023] found 19 het SNPs and 114 homo SNPs
[Fri Oct 13 12:52:22 2023] Reading chrom lengths from VCF header
[Fri Oct 13 12:52:22 2023] Read 1 chromosome names and their lengths
[Fri Oct 13 12:52:22 2023] Calculating heterozygous SNP densities
Traceback (most recent call last):
  File "/Users/abdurahman/jloh/src/stats", line 331, in <module>
    main(args)
  File "/Users/abdurahman/jloh/src/stats", line 263, in main
    Het, Het_quant = calculate_chrom_snp_densities(Hetero_lines, Chrom_lengths, args)
  File "/Users/abdurahman/jloh/src/stats", line 244, in calculate_chrom_snp_densities
    df_quant = df.quantile(q=[0.05, 0.10, 0.15, 0.50, 0.85, 0.90, 0.95])
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/frame.py", line 10927, in quantile
    res = data._mgr.quantile(qs=q, axis=1, interpolation=interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1587, in quantile
    blocks = [
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1588, in <listcomp>
    blk.quantile(axis=axis, qs=qs, interpolation=interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 1461, in quantile
    result = quantile_compat(self.values, np.asarray(qs._values), interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 37, in quantile_compat
    return quantile_with_mask(values, mask, fill_value, qs, interpolation)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 95, in quantile_with_mask
    result = _nanpercentile(
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/pandas/core/array_algos/quantile.py", line 216, in _nanpercentile
    return np.percentile(
  File "<__array_function__ internals>", line 180, in percentile
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4166, in percentile
    return _quantile_unchecked(
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
    r, k = _ureduce(a,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
    r = func(a, **kwargs)
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4710, in _quantile
    result = _lerp(previous,
  File "/Users/abdurahman/anaconda3/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4527, in _lerp
    diff_b_a = subtract(b, a)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
MatteoSchiavinato commented 1 year ago

This is a strange behavior. Could you try to run it with any other VCF file you own?

Rahmmy commented 1 year ago

Sure When I try my own vcf file, I get a different error.

[Fri Oct 20 09:57:00 2023] Reading SNPs Traceback (most recent call last): File "/Users/abdurahman/jloh/src/stats", line 331, in <module> main(args) File "/Users/abdurahman/jloh/src/stats", line 253, in main Hetero_lines, Homo_lines = hetero_and_homo_snps(args.vcf) File "/Users/abdurahman/jloh/src/stats", line 77, in hetero_and_homo_snps Vcf_lines = [ line for line in INPUT if line[0] != "#" ] File "/Users/abdurahman/jloh/src/stats", line 77, in <listcomp> Vcf_lines = [ line for line in INPUT if line[0] != "#" ] File "/Users/abdurahman/anaconda3/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

MatteoSchiavinato commented 1 year ago

Hi, sorry I was away for a few days. Is your file g-zipped by any chance? the input must be a normal uncompressed VCF file.

MatteoSchiavinato commented 1 year ago

I have tested it with three different VCF files from three different calles (freebayes, GATK, bcftools call) and I don't get such error. I am unable to reproduce it, sorry! First of all check if the file is zipped or not, and if not, maybe attach it here so I can test it with your file!

MatteoSchiavinato commented 1 year ago

Hi, I have managed to reproduce the issue and (in my case) I have fixed it. You will have to download the new commit (v1.0.2) which is not yet the one in docker hub.

Clone it from github here, or wait a few days for the dockerhub version :)