chrisquince / DESMAN

De novo Extraction of Strains from MetAgeNomes
Other
69 stars 22 forks source link

Issue with SelectContigsPos.pl and ExtractCountFrep.pl #16

Open gaohanlisa opened 7 years ago

gaohanlisa commented 7 years ago

Hi, I tried to use selectContigsPos.pl to extract core COGs and use ExtractCountFrep.pl to get the variant frequency table. After I run ExtractCountFrep.pl, I got an empty variant frequency table. I them checked my position file (ClusterEC_core_cogs.tsv) generated by selectCongtigsPos. The format is contig, start, end. I assume the correct format should include cog number. But why do I miss that information?

Thanks,

chrisquince commented 7 years ago

Hi,

Sorry for the very slow response. Actually the format you have for ClusterEC_core_cogs.tsv is correct this just contains the contig start and end positions that is all that is needed for bam-readcount. I have added some discussion of that to the README. So the problem is with the perl script used to collate the base frequencies. I have now replaced that with a more robust python script. Have a look at the revised README but the command is:

python $DESMAN/scripts/ExtractCountFreqGenes.py AnnotateEC/ClusterEC_core.cogs Counts --output_file Cluster_esc3_scgs.freq

Does this fix your problem?

Thanks, Chris

KevinAMeyer commented 5 years ago

Hello,

I've also been having issues with SelectContigsPos.pl in my workflow. Even when using StrainMetaSim Mock dataset. The Cluster_core.cogs file does not include any of the gene locations or strand information. As a result, I don't get any basecount files in the steps that follow.

Command: while read -r cluster do echo $cluster ../SelectContigsPos.pl /usr/share/maganalysis/cogs.txt < Split/${cluster}/${cluster}.cog > Split/${cluster}/${cluster}_core.cogs done < Concoct/Cluster75.txt

Input .cog file sample k141_30204_2,COG1555 k141_34559.0_2,COG0210 k141_34559.0_3,COG1391 k141_34559.0_5,COG4304 k141_34559.0_8,COG5266

Output file sample COG0016,k141_525462.5,,,k141_525462.5_5, COG0048,k141_441551,,,k141_441551_3, COG0049,k141_441551,,,k141_441551_4, COG0051,k141_292107.0,,,k141_292107.0_2, COG0052,k141_39482.10,,,k141_39482.10_14, COG0060,k141_99476.3,,,k141_99476.3_4,

My contigs.tsv file looks like this: k141_4315 1 2705 k141_9378 1 2043 k141_20287 1 5049 k141_30204 1 5940 k141_34559.0 1 10000 k141_34559.1 1 10000 k141_34559.2 1 14020

Any suggestions on what may be happening, or how to include my gene locations in the core.cogs file?

Thanks for your help. Kevin