arvkevi / ezancestry

Easy genetic ancestry predictions in Python
https://ezancestry.streamlit.app
MIT License
56 stars 11 forks source link

Script erroring on single VCF input #57

Open dbrami opened 2 years ago

dbrami commented 2 years ago

Hi Kevin, Here's the output from running script on a .VCF file: (dnafinger) (base) ➜ VCFs ls HG01082.hard-filtered.vcf (dnafinger) (base) ➜ VCFs ezancestry predict ./ no SNPs loaded... 2022-06-15 08:44:45.149 | DEBUG | ezancestry.process:_input_to_dataframe:260 - No snps found in the input_data 2022-06-15 08:44:45.159 | DEBUG | ezancestry.process:process_user_input:194 - 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte 2022-06-15 08:44:45.159 | WARNING | ezancestry.process:process_user_input:195 - Skipping .DS_Store because it was not valid 2022-06-15 08:44:45.170 | INFO | ezancestry.process:encode_genotypes:137 - Successfully loaded an encoder from /Users/bramid/.ezancestry/data/models/one_hot_encoder.kidd.bin Traceback (most recent call last): File "/Users/bramid/dnafinger/bin/ezancestry", line 8, in <module> sys.exit(app()) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/typer/main.py", line 214, in __call__ return get_command(self)(*args, **kwargs) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper return callback(**use_params) # type: ignore File "/Users/bramid/dnafinger/lib/python3.9/site-packages/ezancestry/commands.py", line 288, in predict snpsdf = encode_genotypes( File "/Users/bramid/dnafinger/lib/python3.9/site-packages/ezancestry/process.py", line 140, in encode_genotypes X = ohe.transform(df.values) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 471, in transform X_int, X_mask = self._transform(X, handle_unknown=self.handle_unknown, File "/Users/bramid/dnafinger/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 113, in _transform X_list, n_samples, n_features = self._check_X( File "/Users/bramid/dnafinger/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py", line 44, in _check_X X_temp = check_array(X, dtype=None, File "/Users/bramid/dnafinger/lib/python3.9/site-packages/sklearn/utils/validation.py", line 63, in inner_f return f(*args, **kwargs) File "/Users/bramid/dnafinger/lib/python3.9/site-packages/sklearn/utils/validation.py", line 726, in check_array raise ValueError("Found array with %d sample(s) (shape=%s) while a" ValueError: Found array with 0 sample(s) (shape=(0, 55)) while a minimum of 1 is required.

I'm attaching the first 1k lines of this VCF in case it's a problem with VCF, not the code. The complete VCF can be downloaded as so: aws s3 cp --no-sign-request s3://1000genomes-dragen-3.7.6/data/individuals/hg19-graph-based/HG01082/HG01082.hard-filtered.vcf.gz .

1k_HG01082.hard-filtered.vcf.gz

apriha commented 2 years ago

This looks like it a problem parsing VCF and could be transferred to snps.