christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

Difficulties when using iRep with long read data #41

Open mshamash opened 1 year ago

mshamash commented 1 year ago

Hello,

We would like to try and use iRep with some long read data we have. Our reads were assembled and binned, yielding several bins which are represented by just 1 large circular contig. All of the reads were mapped to all of the bins (concatenated multifasta file) using minimap2 generating a samfile, which was then sorted prior to using iRep.

I now ran iRep as follows for each bin (bin1 as an example here): iRep -f bins/bin1.fa -s align-bins.sorted.sam -o d3-irep-bin1

Unfortunately it looks like iRep did not work, and the average coverage is significantly lower than expected. The bin1 (represented by a single large contig) actually has a mean depth of ~130, and no t0.11 as indicated in the iRep output PDF (see attached files).

I also tested running iRep with bin*.fa for analysis of all bins with the same result.

Hoping to get some input on this, as we're excited to use iRep on our long read datasets. Thanks.

d3-irep-bin1.pdf

Screenshot 2023-02-01 at 4 34 27 PM
christophertbrown commented 1 year ago

Hi Michael,

Thanks for your interest in iRep! Unfortunately, I have not done any testing with long reads and am not sure that the method would perform correctly. My guess is that the filtering parameters are not behaving correctly with the long reads, resulting in the lower coverage.

Best,

Chris

On Feb 1, 2023, at 1:34 PM, Michael Shamash @.***> wrote:

Hello,

We would like to try and use iRep with some long read data we have. Our reads were assembled and binned, yielding several bins which are represented by just 1 large circular contig. All of the reads were mapped to all of the bins (concatenated multifasta file) using minimap2 generating a samfile, which was then sorted prior to using iRep.

I now ran iRep as follows for each bin (bin1 as an example here): iRep -f bins/bin1.fa -s align-bins.sorted.sam -o d3-irep-bin1

Unfortunately it looks like iRep did not work, and the average coverage is significantly lower than expected. The bin1 (represented by a single large contig) actually has a mean depth of ~130, and no t0.11 as indicated in the iRep output PDF (see attached files).

I also tested running iRep with bin*.fa for analysis of all bins with the same result.

Hoping to get some input on this, as we're excited to use iRep on our long read datasets. Thanks.

d3-irep-bin1.pdf https://github.com/christophertbrown/iRep/files/10561755/d3-irep-bin1.pdf https://user-images.githubusercontent.com/8537248/216168497-e738f8a5-3351-47b3-b571-0b7359cc02c6.png — Reply to this email directly, view it on GitHub https://github.com/christophertbrown/iRep/issues/41, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACS3G2P4KN6EUC3Y2CVKANTWVLJHVANCNFSM6AAAAAAUOHX43E. You are receiving this because you are subscribed to this thread.