Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
167 stars 44 forks source link

Parsing nirvana outputs effective with toy VCFs but fails with real data, changes in format responsible? #116

Closed jessmewald closed 8 months ago

jessmewald commented 8 months ago

Hi there!

I appreciate the example notebooks you provided for parsing the nirvana output json files. I had great success expanding on those notebooks and parsing the toy vcf you all provided.

However, when I attempt to use those same scripts on the files my lab generated, I get a few errors that do not occur with the toy files. First when I parse the header exactly the same as the example notebook, I get the following traceback:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5375: ordinal not in range(128)

The other two examples in that notebook, Retrieve all relevant genes and their OMIM gene names and Retrieve variants under a gnomAD allele frequency threshold, both generate empty data frames when I attempt to use them on my data. No tracebacks, just empty data frames. Again, these examples work exactly as one would expect when tested on the ceph_trio_test.json.gz toy file provided.

Has the json output format changed in a significant way since those examples were generated, or should I expect them to be functional on our Nirvana outputs as well as the test files?

The version my lab uses is Nirvana 3.21.0-0-gd2a0e953 Thanks for any help you can offer!