caravagnalab / rcongas

rcongas
GNU General Public License v3.0
7 stars 1 forks source link

Each row in `x` should match at most 1 row in `y`. #25

Closed mmfalco closed 1 year ago

mmfalco commented 1 year ago

After running CONGAS on 1 of my samples I was not able to obtain the clusters I think due to the following warning:

> input_rcongas <- init(data = count_matrix , cnv_data = cnv_table, description = s,
+                       reference_genome = "hg19", online = FALSE, correct_bins = T)
ℹ Extracting gene data from reference.
✔ Validated input(s): 904 cells, 12449 genes and 126 CNA segments.
ℹ Processing input counts.
ℹ Assembly Rcongas object.
! Mapping inconsistent for 687 genes out 10610, removing those from the raw data table.
ℹ Retaining 98.6 Mb long-format tibble data with 4274786 points, matrix was 86.7 Mb.
ℹ Object size in memory: 99.9 Mb.

── [ Rcongas ] X ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

→ Data: 904 cells with 9878 genes, aggregated in 31 segments.

! Clusters: not available.
Warning message:
Each row in `x` should match at most 1 row in `y`.
ℹ Row 109 of `x` matches multiple rows.
ℹ If multiple matches are expected, specify `multiple = "all"` in the join call to silence this warning. 

After digging in the code apparently some genes appear in multiple chromosomes which makes this step crash:

  gene_locations = gene_locations %>% left_join(locations, 
                                                by = "gene") %>% dplyr::select(gene, chr, from, to, segment_id)

For example in my case gene "U1" according to gene_locations object appears in :

 1 U1    chr1   16860381  16862144 chr1:14743501:22118500  
 2 U1    chr1   17197440  17200587 chr1:14743501:22118500  
 3 U1    chr3  125879126 125879288 chr1:14743501:22118500  
 4 U1    chr7  119646030 119646155 chr1:14743501:22118500  

So maybe it is necessary applying an extra filtering step.

Militeee commented 1 year ago

Uhm that's fairly weird, can you provide a minimal dataset to reproduce the issue, I'll dig into it asap.

mmfalco commented 1 year ago

Sorry for the false alarm, it was late Friday and I clearly needed a break. It is just a warning, and the subclone callings comes in the next steps... However it would be nice to add an extra filtering step and filter genes with multiple mappings (such as "U1") in order to avoid that warning. Or perhaps changing the multiple = NULL default argument in left_join function. @Militeee Thanks for your quick responses btw.