christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

iRep_filter.py overwrites coverage table with relative abundance table when run twice #24

Open nr0cinu opened 5 years ago

nr0cinu commented 5 years ago

Hi!

I noticed, that when iRep_filter.py is run twice (e.g., first collating genomes, then samples), the coverage information gets lost.

Here is a minimal example:

input: a.txt b.txt

when i run iRep_filter.py -t a.txt b.txt > ab.txt it results in ab.txt

Notice how these sections are suddenly identical

## coverage
# genome    sample_a.sam    sample_b.sam
genome1.fna 80.57256734444161   81.80919187285467
genome2.fna 19.427432655558402  18.031204027824025
genome3.fna 0.0 0.15960409932130895
## relative abundance
# genome    sample_a.sam    sample_b.sam
genome1.fna 80.5725673444416    81.80919187285467
genome2.fna 19.4274326555584    18.031204027824025
genome3.fna 0.0 0.15960409932130895

It’s not shown in this example, but I have observed, that when the coverage values are replaced with the wrong "relative abundance" values, they seem to be used for coverage filtering, which can result in loosing valid iRep values.

Thanks! Bela

nr0cinu commented 5 years ago

As a workaround, removing the relative abundance section before using iRep_filter.py seems to fix it.

This is the command I use the process the files before merging them:

awk '/## index of replication/{ flag=1 } /## relative abundance/{ flag=0 } /## % windows passing filter/{ flag=1 } flag' a.tsv > a_tmp.tsv