aineniamh / squirrel

GNU General Public License v3.0
9 stars 6 forks source link

Can't find the file 'NC_063383.aln.cds.fasta' , How can I get it? #31

Closed YangJingqii closed 2 months ago

YangJingqii commented 2 months ago

Dear,

I'm running cleaner_apobec_work.ipynb from your repository. However I can't find one file named "/Users/s1680070/repositories/alignHPXV/squirrel/data/NC_063383.aln.cds.fasta" in the part of get_dimer_potential_mutation.So I extracted all cds and calculate them using the function get_dimer_potential_mutation ,then add them ,but I get 'nonsynonymous': 12605, 'synonymous': 7738, 'nonsense': 811 for clade ii, which is quite different from the result showed in your paper. I want to know what's wrong with my calculating approach ?And what does 'NC_063383.aln.cds.fasta' look like and how can I get it? I'm very much looking forward to your answer. Thank you so much!

aineniamh commented 2 months ago

NC_063383.aln.cds.fasta.zip

Hi @YangJingqii, I've included the file above- this is just the reference sequence (https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) run through squirrel with the --extract-cds flag. The counts may be slightly different now as we may have masked slightly different sections of the genome since then, but it should be pretty similar to previous. Hope that helps!

aineniamh commented 2 months ago

Will close this now, but let me know if you need anything else

YangJingqii commented 2 months ago

NC_063383.aln.cds.fasta.zip

Hi @YangJingqii, I've included the file above- this is just the reference sequence (https://www.ncbi.nlm.nih.gov/nuccore/NC_063383) run through squirrel with the --extract-cds flag. The counts may be slightly different now as we may have masked slightly different sections of the genome since then, but it should be pretty similar to previous. Hope that helps!

That helps a lot! Thank you very much!