Created a dummy branch to add some comments to CountSNPASE.py. In particular, I've indicated:
How pandas can be used to read in tables with more clarity.
How the latest versions of pysam allow you to parse variants from a read much more efficiently than the code I wrote last year. It's important here that you're using htslib 1.3+ versions as a bug in the original implementation of this code led to errors.
How pysam allows indexed retrieval of sequence from FASTA files making the whole fasta_to_dict() construction useless.
Unfortunately I don't have time to formally implement this stuff and test it, but hopefully it will lead to some improvements in code speed, reliability, and maintainability. This isn't the deepest dive back into the code, but I'm happy to check in periodically and discuss tweaks and improvements.
Also, there may be a few syntax errors in there (I think I used added too many brackets around the pandas.DataFrame.columns call), so consider it pseudocode for testing :warning:
Created a dummy branch to add some comments to CountSNPASE.py. In particular, I've indicated:
Unfortunately I don't have time to formally implement this stuff and test it, but hopefully it will lead to some improvements in code speed, reliability, and maintainability. This isn't the deepest dive back into the code, but I'm happy to check in periodically and discuss tweaks and improvements.