GenoTools: Advanced Genotype Data Analysis A robust suite for processing genotype data, offering genotype calling (.idat to PLINK), comprehensive sample/variant QC, and ancestry estimation. Ideal for computational biology and genetics research.
Apache License 2.0
22
stars
7
forks
source link
Failure to read pvar in upfront_check if it came from a vcf with header rows '##' #169
Describe the bug
Failure to read pvar in upfront_check if it came from a vcf with header rows '##'
To Reproduce
convert a vcf to pgen and then try to run pipeline
Expected behavior
when reading pvar, '##' columns should be skipped
Screenshots
Traceback (most recent call last):
File "/gpfs/gsfs12/users/vitaled2/.venv/bin/genotools", line 8, in
sys.exit(handle_main())
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/main.py", line 116, in handle_main
args_dict = upfront_check(args_dict['geno_path'], args_dict)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/utils.py", line 119, in upfront_check
var = pd.read_csv(f'{geno_path}.pvar', sep = '\s+', low_memory = False)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 796, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 884, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 4
Describe the bug Failure to read pvar in upfront_check if it came from a vcf with header rows '##'
To Reproduce convert a vcf to pgen and then try to run pipeline
Expected behavior when reading pvar, '##' columns should be skipped
Screenshots Traceback (most recent call last): File "/gpfs/gsfs12/users/vitaled2/.venv/bin/genotools", line 8, in
sys.exit(handle_main())
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/main.py", line 116, in handle_main
args_dict = upfront_check(args_dict['geno_path'], args_dict)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/utils.py", line 119, in upfront_check
var = pd.read_csv(f'{geno_path}.pvar', sep = '\s+', low_memory = False)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 796, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 884, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 4