cerebis / sim3C

Read-pair simulation of 3C-based sequencing methodologies (HiC, Meta3C, DNase-HiC)
GNU General Public License v3.0
19 stars 5 forks source link

Bug / problem at end of chromosomes #26

Open ivargr opened 1 year ago

ivargr commented 1 year ago

Hi!

I've spent some time debugging why sim3C seems to give too many read pairs where one read is at the end of a chromosome and the other at the beginning. Typically a visualization of a HiC-matrix looks like this, even on a completely random simulated genome:

image

Note the high interaction at the top right and lower left of the chromosome pictured.

After digging through the code, I noticed this:

image

I think it's problematic that this is not documented in any way or has not been fixed. I would put a warning in the Readme about this so users are aware.

cerebis commented 5 months ago

If the chromosome is circular, the interaction you're observing will occur. If, on the other hand, the employed reference is fragmentary, treating the sequences as linear should eliminate the feature.

Here is an example of a real Hi-C contact map E.coli [1] 483777_1_En_9_Fig1_HTML

[1] Thierry, A., & Cockram, C. (2022). Generating high-resolution Hi-C contact maps of bacteria. Methods in Molecular Biology (Clifton, N.J.), 2301, 183–195. https://doi.org/10.1007/978-1-0716-1390-0_9

ivargr commented 5 months ago

Thanks for the reply!

I'm not sure I fully understand. If I do not have circular genomes, will the simulation then be wrong at the chromosome boundaries?

I agree that the simulation now makes sense for circular genomes, but my genomes are not.

cerebis commented 5 months ago

Yes, you'd be better off adding "--linear" if your chosen genomes are not by nature circular or if they are incomplete drafts.

As I write to you, I think it would be better if this feature was not an all-or-nothing switch but rather an annotation in the definition of the community.

Regards Matt D.

On Wed, 17 Jan 2024 at 21:09, ivargr @.***> wrote:

Thanks for the reply!

I'm not sure I fully understand. If I do not have circular genomes, will the simulation then be wrong at the chromosome boundaries?

I agree that the simulation now makes sense for circular genomes, but my genomes are not.

— Reply to this email directly, view it on GitHub https://github.com/cerebis/sim3C/issues/26#issuecomment-1895489666, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN2PC6N5FDIACFI5QVZCUTYO6PN5AVCNFSM6AAAAAAYNKCNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJVGQ4DSNRWGY . You are receiving this because you commented.Message ID: @.***>