gao-lab / GLUE

Graph-linked unified embedding for single-cell multi-omics data integration
MIT License
364 stars 55 forks source link

Invalid BED format error in the tutorial #66

Closed vietdhoang closed 1 year ago

vietdhoang commented 1 year ago

Hi,

I'm really interested in trying out GLUE, and I'm currently running the tutorial with the Chen 2019 data to see if everything is running smoothly. When running the tutorial, I'm getting a ValueError: Invalid BED format! when trying to generate the guidance graph: guidance = scglue.genomics.rna_anchored_guidance_graph(rna, atac).

When looking at my RNA and ATAC data, it doesn't seem to be the case:

ATAC-seq

peaks    chrom    chromStart    chromEnd                                       
chr1:3005833-3005982  chr1     3005833   3005982
chr1:3094772-3095489  chr1     3094772   3095489
chr1:3119556-3120739  chr1     3119556   3120739
chr1:3121334-3121696  chr1     3121334   3121696
chr1:3134637-3135032  chr1     3134637   3135032
...                    ...         ...       ...
chrY:1086239-1086779  chrY     1086239   1086779
chrY:1090474-1090713  chrY     1090474   1090713
chrY:1232696-1232955  chrY     1232696   1232955
chrY:1245435-1245988  chrY     1245435   1245988
chrY:1246136-1246326  chrY     1246136   1246326

RNA-seq

genes     chrom  chromStart   chromEnd                               
0610005C13Rik   chr7    45567793   45575327
0610009B22Rik  chr11    51685385   51688874
0610009E02Rik   chr2    26445695   26459390
0610009L18Rik  chr11   120348677  120351190
0610010F05Rik  chr11    23564960   23633639
...              ...         ...        ...
Vmn1r181        chr7    23974614   23988139
Vmn1r193       chr13    22213671   22223160
Vmn1r68         chr7    10507397   10558485
Vmn1r82         chr7    12300429   12308582   
n-R5s50        chr14   110161834  110161945

Would it be possible for someone to point me in the right direction? Am I missing something?

Thanks!

Jeff1995 commented 1 year ago

Hi Viet, thanks for the report! I don't see any problem based on the above information either. Could you provide a complete traceback and a peak at all columns of rna.var and atac.var (e.g., via rna.var.dtypes)?

Jeff1995 commented 1 year ago

Closing this in favor of #69.