TaoYang-dev / hicrep

R package to evaluate the reproducibility of Hi-C data
25 stars 4 forks source link

duplicate 'row.names' are not allowed #56

Open haochenl opened 7 years ago

haochenl commented 7 years ago

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

TaoYang-dev commented 7 years ago

Hi Haochen,

Thank you for your email. hicrep is originally designed to evaluate intra-chromosome reproducibility, for one chromosome at a time. But you bring out a good point. We are recently working on speed up the pipeline, as well as taking whole genome data and multiple file types. Please come back to check my github page in a week or so. We will release a new version shortly. Thank you.

Tao

On Sat, Jun 3, 2017 at 11:45 AM, Haochen Li notifications@github.com wrote:

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56, or mute the thread https://github.com/notifications/unsubscribe-auth/ATEu6_UGgWJqumGVt-EoW9OobBkrLb87ks5sAX-lgaJpZM4NvHo6 .

-- Tao Yang PhD Candidate Bioinformatics and Genomics Penn State University

TaoYang-dev commented 7 years ago

Hi Haochen,

What is the format of genome-wide Hi-C data?

Tao

What is

On Sat, Jun 3, 2017 at 11:45 AM Haochen Li notifications@github.com wrote:

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56, or mute the thread https://github.com/notifications/unsubscribe-auth/ATEu6_UGgWJqumGVt-EoW9OobBkrLb87ks5sAX-lgaJpZM4NvHo6 .

-- Tao Yang PhD Candidate Bioinformatics and Genomics Penn State University

haochenl commented 7 years ago

Same as what's on the online tutorial. The N * (N + 3) matrix. I wrote script to generate a text file in such format and load it into R as a data.frame. On Sat, Jun 3, 2017 at 12:49 PM Tao Yang notifications@github.com wrote:

Hi Haochen,

What is the format of genome-wide Hi-C data?

Tao

What is

On Sat, Jun 3, 2017 at 11:45 AM Haochen Li notifications@github.com wrote:

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56, or mute the thread < https://github.com/notifications/unsubscribe-auth/ATEu6_UGgWJqumGVt-EoW9OobBkrLb87ks5sAX-lgaJpZM4NvHo6

.

-- Tao Yang PhD Candidate Bioinformatics and Genomics Penn State University

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56#issuecomment-305997451, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOjUge2boKi82-UZVlizHK7S8qtA1fCks5sAbjigaJpZM4NvHo6 .

haochenl commented 7 years ago

We internally store genome-wide matrix in hdf5 format (similar as what cooler does). But I don't think hicrep supports that.

Haochen On Sat, Jun 3, 2017 at 12:49 PM Tao Yang notifications@github.com wrote:

Hi Haochen,

What is the format of genome-wide Hi-C data?

Tao

What is

On Sat, Jun 3, 2017 at 11:45 AM Haochen Li notifications@github.com wrote:

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56, or mute the thread < https://github.com/notifications/unsubscribe-auth/ATEu6_UGgWJqumGVt-EoW9OobBkrLb87ks5sAX-lgaJpZM4NvHo6

.

-- Tao Yang PhD Candidate Bioinformatics and Genomics Penn State University

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56#issuecomment-305997451, or mute the thread https://github.com/notifications/unsubscribe-auth/AOOjUge2boKi82-UZVlizHK7S8qtA1fCks5sAbjigaJpZM4NvHo6 .

TaoYang-dev commented 7 years ago

The new version is compatible with cooler format. Release soon.

On Sat, Jun 3, 2017 at 6:59 PM Haochen Li notifications@github.com wrote:

We internally store genome-wide matrix in hdf5 format (similar as what cooler does). But I don't think hicrep supports that.

Haochen On Sat, Jun 3, 2017 at 12:49 PM Tao Yang notifications@github.com wrote:

Hi Haochen,

What is the format of genome-wide Hi-C data?

Tao

What is

On Sat, Jun 3, 2017 at 11:45 AM Haochen Li notifications@github.com wrote:

Hi,

I prepared whole genome Hi-C matrices to run hicrep. However, I got the error as in the title. It seems that hicrep is trying to use the bin position as rownames when processing the data and encounters rowname duplicates when there are multiple chromosomes. So, does hicrep support genome-wide Hi-C matrix? If there is any other specification of the input matrix format, please let me know.

Thanks, Haochen

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56, or mute the thread <

https://github.com/notifications/unsubscribe-auth/ATEu6_UGgWJqumGVt-EoW9OobBkrLb87ks5sAX-lgaJpZM4NvHo6

.

-- Tao Yang PhD Candidate Bioinformatics and Genomics Penn State University

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56#issuecomment-305997451, or mute the thread < https://github.com/notifications/unsubscribe-auth/AOOjUge2boKi82-UZVlizHK7S8qtA1fCks5sAbjigaJpZM4NvHo6

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/MonkeyLB/hicrep/issues/56#issuecomment-306006459, or mute the thread https://github.com/notifications/unsubscribe-auth/ATEu61JQd95YE_NvWZRDL3zKuFYaNXtlks5sAeU7gaJpZM4NvHo6 .