Firstly thx for your amazing work.
I really like the concept behind your tool :)
Now concerning the bug:
I worked with a fresh Colab notebook
MMSplice version:
Installation with pip: mmsplice-2.4.0
Python version:
Python 3.10.13
Operating System:
Colab notebook
Description
I wanted to try out mmsplice, so I've downloaded your example data and gave it a go.
But even though I used your test data, I constantly received the following error:
ValueError: Fasta chrom names do not match with vcf chrom names.
After hours of wrapping my head around and hacking with the package code I found out:
Not only a VCF-file is required as input, but also its index version (i.e. vcf.gz.tbi).
That was never stated (check your ReadMe)...
Please fix that.
Also, when hacking around, I realized the parsing of the seqnames seems buggy.
It only parses seqnames when an indexed VCF file is provided
But it also always includes {'1'} as seqname for the VCF, no matter what is provided?!
# In your vcf_dataloader.py
def _check_chrom_annotation(self):
# I've added these two lines
fasta_chroms = set(self.fasta.fasta.keys())
vcf_chroms = set(self.vcf.seqnames)
print("fasta: ", fasta_chroms, flush=True)
print("vcf_chroms", vcf_chroms, flush=True)
if not fasta_chroms.intersection(vcf_chroms):
raise ValueError(
'Fasta chrom names do not match with vcf chrom names')
--> Output:
fasta: {'17'}
vcf_chroms {'1', '17'}
...
The VCF seqnames should only include 17, since I've just provided your example.
Firstly thx for your amazing work. I really like the concept behind your tool :)
Now concerning the bug: I worked with a fresh Colab notebook
MMSplice version: Installation with pip:
mmsplice-2.4.0
Python version: Python 3.10.13
Operating System: Colab notebook
Description
I wanted to try out mmsplice, so I've downloaded your example data and gave it a go. But even though I used your test data, I constantly received the following error:
ValueError: Fasta chrom names do not match with vcf chrom names
.After hours of wrapping my head around and hacking with the package code I found out:
Not only a VCF-file is required as input, but also its index version (i.e.
vcf.gz.tbi
). That was never stated (check your ReadMe)... Please fix that.Also, when hacking around, I realized the parsing of the seqnames seems buggy.
{'1'}
as seqname for the VCF, no matter what is provided?!The VCF seqnames should only include 17, since I've just provided your example.
What I Did
Take a look: https://colab.research.google.com/drive/1hx6PAYT_lKuEYtnHCq0PN2lyvNBNDud1?usp=sharing