dvitale199 / GenoTools

GenoTools: Advanced Genotype Data Analysis A robust suite for processing genotype data, offering genotype calling (.idat to PLINK), comprehensive sample/variant QC, and ancestry estimation. Ideal for computational biology and genetics research.
Apache License 2.0
22 stars 7 forks source link

Failure to read pvar in upfront_check if it came from a vcf with header rows '##' #169

Closed dvitale199 closed 5 months ago

dvitale199 commented 5 months ago

Describe the bug Failure to read pvar in upfront_check if it came from a vcf with header rows '##'

To Reproduce convert a vcf to pgen and then try to run pipeline

Expected behavior when reading pvar, '##' columns should be skipped

Screenshots Traceback (most recent call last): File "/gpfs/gsfs12/users/vitaled2/.venv/bin/genotools", line 8, in sys.exit(handle_main()) File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/main.py", line 116, in handle_main args_dict = upfront_check(args_dict['geno_path'], args_dict) File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/genotools/utils.py", line 119, in upfront_check var = pd.read_csv(f'{geno_path}.pvar', sep = '\s+', low_memory = False) File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv return _read(filepath_or_buffer, kwds) File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 583, in _read return parser.read(nrows) File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1704, in read ) = self._engine.read( # type: ignore[attr-defined] File "/gpfs/gsfs12/users/vitaled2/.venv/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 796, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 884, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 4