cruk-mi / mesa

mesa package for Methylation Enrichment Sequencing Analysis
9 stars 3 forks source link

`combineQsets` mingles column names #30

Open lbeltrame opened 1 month ago

lbeltrame commented 1 month ago

Case in point:

 qseaSet@cnv
GRanges object with 3021 ranges and 8 metadata columns:
         seqnames              ranges strand |  10876287   5494098   5837175   5848539   5910748 sample_29_09_2023

combined <- combineQsets(qseaSet, pon)

combined@cnv # Output truncated
GRanges object with 3021 ranges and 24 metadata columns:
         seqnames              ranges strand | X10876287  X5494098  X5837175  X5848539  X5910748 sample_29_09_2023
            <Rle>           <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric>         <numeric>

The first columns, which had numeric IDs, have been changed by R, therefore all selection operators which involve these columns will fail:


# Remove one columns, but this will mean that the problematic columns will stay
subsetQset(combined, samplesToDrop = c("sample7"))
Error: subscript contains invalid names --> because of the mismatch beetween the GRanges object and the sample table
lbeltrame commented 1 month ago

It's weird though, because it should avoid doing so:

https://github.com/cruk-mi/mesa/blob/212a791cb4da2d2d2c099c4ef53b4b71d5f1ee65/R/combineQsets.R#L191C4-L194C28

Oddly enough, it doesn't:

qseaSet@cnv %>%
        data.frame(check.names = FALSE, check.rows = F) %>% head()

seqnames   start     end   width strand X10876287 X5494098 X5837175 X5848539 X5910748 sample_29_09_2023 sample_HL_02_10_2023
lbeltrame commented 1 month ago

The problem is in the conversion of the GRanges itself, unfortunately:

https://github.com/Bioconductor/GenomicRanges/blob/2b7bf7d519a2091652ceef6d6c47e3aaa1030900/R/GenomicRanges-class.R#L264

SPPearce commented 1 month ago

I'd highly recommend that you don't use sampleNames that can't be converted cleanly into column names, it'll break several things. If there isn't already a check then it should be added to makeQset to prevent people from doing that earlier on.

lbeltrame commented 1 month ago

Those unfortunately are names that come from the wet side, and changing them will break sample identification later on. I'd rather avoid that if possible.

EDIT: I assume there's no easy way of changing sample names short of recreating the qseaSet?

SPPearce commented 1 month ago

Yes, you can rename sample names using either renameSamples or renameQsetNames. At some point renameQsetNames should probably be removed IMO.

lbeltrame commented 1 month ago

Thanks, for now I renamed the samples.