aidenlab / Juicebox

Visualization and analysis software for Hi-C data -
https://aidenlab.org/juicebox
MIT License
244 stars 58 forks source link

Dump extra lines bug #442

Open nchernia opened 8 years ago

nchernia commented 8 years ago

Extra lines dumped when dumping norm vector

When I run juicebox dump norm KR .. by using 50kb as the bin size, the correction vector it returned always contain more lines that it should be. For example, chr1, the legth is 249250621, 249250621/50000 = 4985.01242. It should has 4986 lines in the vector file, while it has 4990. The same problem occurred in the other chromosomes. I am not sure the binning method used in juicebox, does anyone know the details and know the reason why several more lines returned by juicebox?

This is actually a bug in MatrixZoomData where HiCFixedGridAxis is called with correctedBinCount * blockColumnCount, which is not actually the binCount. I have no idea why it's constructed this way and not just via ceiling(chromosome.length/bp resolution). May want to ask Jim before making any major changes.

For now we can change dump but we MUST fix the underlying bug before closing.

nchernia commented 7 years ago

@jrobinso If you could take a look at the underlying bug (the fact that there are extra rows/columns stored in each matrix), that would be really helpful.

jrobinso commented 7 years ago

Yes, if someone could ping this issue again in February I will look at it.

On Thu, Jan 5, 2017 at 12:38 AM, nchernia notifications@github.com wrote:

@jrobinso https://github.com/jrobinso If you could take a look at the underlying bug (the fact that there are extra rows/columns stored in each matrix), that would be really helpful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theaidenlab/JuiceboxDev/issues/442#issuecomment-270591165, or mute the thread https://github.com/notifications/unsubscribe-auth/AA49HEroro45sliKterBzVwcQz2YnwmBks5rPKv6gaJpZM4JomyB .

sa501428 commented 5 years ago

@jrobinso

nchernia commented 5 years ago

https://groups.google.com/forum/#!topic/3d-genomics/C5GViBKWPjE

https://groups.google.com/forum/#!topic/3d-genomics/C5GViBKWPjE

nchernia commented 5 years ago

Here's another bug report, same underlying issue.

I believe there is a bug in dump when extracting dense matrices. The matrices I extract are always several rows fewer than expected given the chromosome size and bin size.

The number of rows missing seems to vary between chromosome and dataset. I have attached an example of the number of rows in two of my datasets extracted at bin sizes 10kb and 50kb compared to the expected number. Some seem to have as many as 20 rows missing.

Juicer_dump_missing_rows.txt Juicer_dump_missing_rows.txt

Thank you,

Helen