chrisquince / DESMAN

De novo Extraction of Strains from MetAgeNomes
Other
69 stars 22 forks source link

ExtractCountFreqP.pl broken? #6

Closed alneberg closed 7 years ago

alneberg commented 7 years ago

I can't get this script to work, it outputs the header with files, but no content.

Command:

DESMAN/scripts/ExtractCountFreqP.pl Genes/ClusterEC_core_cogs.tsv Counts/ClusterEC 0 > Variants/ClusterEC_core_cogs.freq

Input:

$ ls -l Counts/ClusterEC/*cnt
-rw-rw-r-- 1 alneberg b2010008 90675603 Jul 29 16:43 Counts/ClusterEC/Sample548_su.cnt
-rw-rw-r-- 1 alneberg b2010008 93663245 Jul 29 16:44 Counts/ClusterEC/Sample564_su.cnt
-rw-rw-r-- 1 alneberg b2010008 91231517 Jul 29 16:44 Counts/ClusterEC/Sample609_su.cnt
-rw-rw-r-- 1 alneberg b2010008 94030001 Jul 29 16:44 Counts/ClusterEC/Sample616_su.cnt
-rw-rw-r-- 1 alneberg b2010008 90613434 Jul 29 16:44 Counts/ClusterEC/Sample620_su.cnt
-rw-rw-r-- 1 alneberg b2010008 94111275 Jul 29 16:44 Counts/ClusterEC/Sample624_su.cnt
-rw-rw-r-- 1 alneberg b2010008 89442070 Jul 29 16:43 Counts/ClusterEC/Sample631_su.cnt
-rw-rw-r-- 1 alneberg b2010008 89979891 Jul 29 16:43 Counts/ClusterEC/Sample687_su.cnt
-rw-rw-r-- 1 alneberg b2010008 89922459 Jul 29 16:43 Counts/ClusterEC/Sample710_su.cnt
-rw-rw-r-- 1 alneberg b2010008 91907071 Jul 29 16:44 Counts/ClusterEC/Sample712_su.cnt
-rw-rw-r-- 1 alneberg b2010008 93762170 Jul 29 16:44 Counts/ClusterEC/Sample717_su.cnt
-rw-rw-r-- 1 alneberg b2010008 94456643 Jul 29 16:44 Counts/ClusterEC/Sample733_su.cnt
-rw-rw-r-- 1 alneberg b2010008 89850341 Jul 29 16:43 Counts/ClusterEC/Sample746_su.cnt
-rw-rw-r-- 1 alneberg b2010008 90649802 Jul 29 16:44 Counts/ClusterEC/Sample759_su.cnt
-rw-rw-r-- 1 alneberg b2010008 87901845 Jul 29 16:43 Counts/ClusterEC/Sample767_su.cnt
-rw-rw-r-- 1 alneberg b2010008 90870567 Jul 29 16:44 Counts/ClusterEC/Sample803_su.cnt

Output:

$ cat Variants/ClusterEC_core_cogs.freq
Cog,Position,Sample548_su-A,Sample548_su-C,Sample548_su-G,Sample548_su-T,Sample564_su-A,Sample564_su-C,Sample564_su-G,Sample564_su-T,Sample609_su-A,Sample609_su-C,Sample609_su-G,Sample609_su-T,Sample616_su-A,Sample616_su-C,Sample616_su-G,Sample616_su-T,Sample620_su-A,Sample620_su-C,Sample620_su-G,Sample620_su-T,Sample624_su-A,Sample624_su-C,Sample624_su-G,Sample624_su-T,Sample631_su-A,Sample631_su-C,Sample631_su-G,Sample631_su-T,Sample687_su-A,Sample687_su-C,Sample687_su-G,Sample687_su-T,Sample710_su-A,Sample710_su-C,Sample710_su-G,Sample710_su-T,Sample712_su-A,Sample712_su-C,Sample712_su-G,Sample712_su-T,Sample717_su-A,Sample717_su-C,Sample717_su-G,Sample717_su-T,Sample733_su-A,Sample733_su-C,Sample733_su-G,Sample733_su-T,Sample746_su-A,Sample746_su-C,Sample746_su-G,Sample746_su-T,Sample759_su-A,Sample759_su-C,Sample759_su-G,Sample759_su-T,Sample767_su-A,Sample767_su-C,Sample767_su-G,Sample767_su-T,Sample803_su-A,Sample803_su-C,Sample803_su-G,Sample803_su-T
chrisquince commented 7 years ago

I have added a more robust python script to the repo for doing this ExtractCountFreqCogs.py although it may need some of the upstream steps changing. It takes gzipped cnt (second argument is the directory containing these) and the cog file format is slightly different. Let me look at the snake and try to adapt it...

What is the format of ClusterEC_core_cogs.tsv?

This script wants:

(contig, start, end, cog, cluster)

comma separated.

alneberg commented 7 years ago

Ok, great! I think I can figure out how to adapt the Snakemake to this new script myself as well. I'll close this and let you know if I'm not able to put it together.