christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

GC_skew for calculating bPTR and a bPTR value less than 1 or is NA #36

Closed Anqi-Dai closed 2 years ago

Anqi-Dai commented 3 years ago

Hello Christopher,

I got some results by running the following steps:

  1. Preprocessing the metagenomic samples, trimming, decontamination, etc.
  2. Use the preprocessed paired end reads to align to a species genome that I'm most interested in (used the reorder flag in the alignment step). I downloaded the genome from NCBI that is suggested to be most similar to the assembled scaffolds in my sample from blasting. (following what you described in the paper)
  3. I then used this species genome and the sam file to do the bPTR calculation with both coverage and gc_skew as method.

Below is the results. "spp" is the target genome I'm interested in and in all S1, S2, S3 samples this spp have been dominating with above 99% relative abundance.

sampleid_ | target_genome | grp | ORI | TER | PTR S1 | spp | coverage | 2,855,607 | 1,369,712 | 1.594636019236461 S2 | spp | coverage | 0 | 0 | n/a S3 | spp | coverage | 0 | 0 | n/a S1 | spp | GC_skew | 3,217,860 | 1,437,980 | 0.9427330774009574 S2 | spp | GC_skew | 3,217,860 | 1,437,980 | 0.6521653399276987 S3 | spp | GC_skew | 3,217,860 | 1,437,980 | 0.8475248538141567

My questions:

  1. How is the GC_skew methods is used in the bPTR calculation? How do I interprete a less than 1 PTR value?
  2. For S1, how to understand the vastly different PTR calculated from both methods?
  3. I think the number in the ORI and TER column represent the read coverage when aligning to this genome? Then why is it 0 for both S2 and S3 in the coverage method results?

Thank you very much for your patience.

Feel free to let me know any questions and suggestions

Angel

Anqi-Dai commented 2 years ago

Hi, after switching pipeline, this issue was resolved.