Closed bogdanovvp closed 1 year ago
Upd: this seems to be an issue in .sqlite generation. Changing types within the sqlite to: variant / vcfinfophred "text" -> "real" variant / vcfinfoalt_reads "text" -> "integer" variant / vcfinfo__tot_reads "text" -> "integer" variant / vcfinfo__af "text" -> "real"
And correcting the "type" values in the respective dictionaries in the "variant_header" table corrects the issue. The respective change should be implemented in the generating code.
Upd2: this recent pull request generally solves the issue https://github.com/KarchinLab/open-cravat-modules-karchinlab/pull/11
Hi bogdanovvp. Thanks a lot for the digging here, and the PR.
Unfortunately, the changes won't work for some jobs. For variants found in more than one sample, those columns are ;
delimited lists, and have to be strings. We are currently planning work on better sample/cohort filtering.
For example, consider a variant in two samples: s1, and s2. The base__sample_id
column will be s1;s2
, and vcfinfo__alt_reads
will be something like 15;28
.
If you look into the sample
table, the column values are better. base__alt_reads
is integer, base__tot_reads
is integer, and base__af
is real. If it's possible for you to query the db directly, you could try that. Or, if you know there's only one sample, the change in your PR works great. But it won't work as a general fix.
We're working on better filtering, and are gathering use-cases. If you're willing to discuss more, I'm interested to know what you're trying to use these columns for.
This is fixed for single-sample vcfs here https://github.com/KarchinLab/open-cravat/issues/149
Uploading vcfs to opencravat seems to result in incorrect parsing of the numeric values (likely parsed as strings), which leads to the hindered filtering![image](https://user-images.githubusercontent.com/54180173/166236573-34fab899-b766-4229-8f2b-3b312fa09304.png)
The header of the VCF file is atached:
fileformat=VCFv4.2
FILTER=
FILTER=
FILTER=
FILTER=
INFO=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=
FORMAT=