drewschield / Comparative-Genomics-Tools

Various Python and Shell scripts for comparative genomics
2 stars 1 forks source link

window_heterozygosity.py example files? #1

Closed DrPintoThe2nd closed 4 months ago

DrPintoThe2nd commented 5 years ago

Hi Drew,

Thanks for making these scripts available! The slidingwindow_gc_content.py has worked flawlessly for me!

I was wondering if you could provide example files for using the window_heterozygosity.py script.. Daren's script outputs a tab-delimited file (seemingly) analogous to a vcf file.. can you simply convert a called vcf file to a tab-delimited file (e.g. https://vcf-kit.readthedocs.io/en/latest/vcf2tsv/) and jump into this script? If this is the "" file, than what is in the "" file and what is the format of that file?

Apologies in advance if I've missed something in either your's or Daren's documentation!! Thanks again! Best, bjp

The principle

drewschield commented 5 years ago

Hey Brendan,

No problem! I’m glad they are useful to you.

My windowed heterozygosity script relies on a ‘window file’ that I generated using bedtools ‘make windows’ function, but you can generate one however makes sense for you. The window file just needs to follow the chromosome, start position, end position format (separated by tabs), as in a BED file. You are correct about Daren’s calcHet output - it’s a fairly standard tab-delimited output that has the heterozygosity for each SNP position in the last column.

I’ve attached a zipped folder with example files, including actual calcHet output and a 100kb genomic window BED file I used for my analyses. To save space, I just included data for the first 11 Mb of data for chromosome 1 of my species. I generated the ‘test.out.txt’ output file using the command:

python window_heterozygosity.py window_100kb.bed calcHet_out.het.txt test.out.txt

Let me know if this is what you were looking for, and if you have any additional questions. Word of warning - both the calcHet and window_heterozygosity.py scripts can take a long time on a whole genome, and will take longer if the genomic windows are at higher resolution. So, these can be good analyses to kick off during lunch or overnight.

Best, Drew

On Aug 12, 2019, at 11:09 AM, Brendan J. Pinto notifications@github.com<mailto:notifications@github.com> wrote:

Hi Drew,

Thanks for making these scripts available! The slidingwindow_gc_content.py has worked flawlessly for me!

I was wondering if you could provide example files for using the window_heterozygosity.py script.. Daren's script outputs a tab-delimited file (seemingly) analogous to a vcf file.. can you simply convert a called vcf file to a tab-delimited file (e.g. https://vcf-kit.readthedocs.io/en/latest/vcf2tsv/https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvcf-kit.readthedocs.io%2Fen%2Flatest%2Fvcf2tsv%2F&data=02%7C01%7Cdschield%40uta.edu%7C13e3cdc0b3bd44b8d44108d71f3f6e55%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C637012229600485235&sdata=CVaVijyMkLTsRrC5nYGknK91fTvsscC%2B%2BCJOV3tzfJs%3D&reserved=0) and jump into this script? If this is the "" file, than what is in the "" file and what is the format of that file?

Apologies in advance if I've missed something in either your's or Daren's documentation!! Thanks again! Best, bjp

The principle

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdrewschield%2FComparative-Genomics-Tools%2Fissues%2F1%3Femail_source%3Dnotifications%26email_token%3DAC3WSP23Z7TCNECIMVAJ6TDQEGDK5A5CNFSM4ILCYZZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HEYDJXQ&data=02%7C01%7Cdschield%40uta.edu%7C13e3cdc0b3bd44b8d44108d71f3f6e55%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C637012229600485235&sdata=h7val9LwHlLu158UXe4vXjPwhgu%2FTVsEcSWIP%2FK4Kh4%3D&reserved=0, or mute the threadhttps://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAC3WSP45MPJRJBMVRLMB6RDQEGDK5ANCNFSM4ILCYZZQ&data=02%7C01%7Cdschield%40uta.edu%7C13e3cdc0b3bd44b8d44108d71f3f6e55%7C5cdc5b43d7be4caa8173729e3b0a62d9%7C0%7C0%7C637012229600495231&sdata=Sh1kipQl%2F77083jmMxvdygqRKXnfbTi6rPf8H8DM7pE%3D&reserved=0.