biosurf / cyCombine

Robust Integration of Single-Cell Cytometry Datasets
Other
24 stars 7 forks source link

Error in prepare_data() #62

Open martuskaR opened 1 week ago

martuskaR commented 1 week ago

Hello,

I am unable to generate my flowSet, due to the following error "Error in cyCombine::transform_asinh(., markers = markers, cofactor = cofactor, : NA" My code is:

Preparing the expression data

dfci <- prepare_data(data_dir = data_dir, metadata = metadata, filename_col = "Filename", batch_ids = "batch", condition = "condition", sample_ids = "Patient_id", markers = markers, derand = TRUE, cofactor = 5, down_sample = FALSE, .keep = FALSE, clean_colnames = TRUE)

sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.4.1

I've tried updating packages, restarting R and my computer, renaming columns etc. Checked for NAs too. Nothing works, and googling this error did not provide any information. Could somebody kindly help and suggest what can be causing this error? I am working on Mac with M1 chip.

Many thanks in advance!

shdam commented 1 week ago

Hi there,

Thank you for using cyCombine!

That is a peculiar error. Try running prepare_data with transform = FALSE and see if anything looks wrong in the output. You can transform later with transform_asinh. You can also try to install the development version from the dev branch, to see if anything changes.

Bear in mind, the output is not a flowSet but a dataframe.

Hope this is helpful, and let me know if you find anything.

Best regards, Søren

martuskaR commented 4 days ago

Hello,

Thank you so much for such a prompt reply! I ran the function with setting transform = FALSE and here is the data frame I got:

image

I have a total of 75 columns. Some columns don't have titles, some have channels and some antigens. There are markers which I didn't request, for example Iridium, EQ6 beads. The markers specified in my prepare_data() are Also, B2M are barcodes, there are 6 of those in total, the data frame only shows 2.

markers
 [1] "CD20"   "CD71"   "CD14"   "CD61"   "CD19"   "CD9"    "CD36"   "CD7"    "CD172a" "CD38"   "CD5"    "CD64"  
[13] "CD11c"  "CD16"   "CD34"   "CD43"   "CD123"  "CD66b"  "CCR2"   "CD40"   "CD27"   "CX3CR1" "CD33"   "CD2"   
[25] "CD274"  "IgD"    "CD141"  "CD1c"   "CD206"  "CD86"   "CD138"  "CD169"  "Ki67"   "CD163"  "CD10"   "CD371" 
[37] "CD164"  "HLA-DR" "MRP-14" "CD24 "  "CLA"    "CD11b"  "B2M_1"  "CD45"  

I am new to CyTOF data analysis on R and cyCombine. I am guessing right that the columns should have antigen names and other parameters, but not channel names?

Kind Regards

shdam commented 4 days ago

Hi there,

It looks like the problem is inconsistent marker names. When using clean_colnames = TRUE, you remove any special characters (e.g., "-" and "_") and channel names. It is optional but improves readability. If you wish, you can set it to FALSE and manually clean the names. You can also use a panel file to define marker names using the panel* arguments - View the vignettes for examples.

Marker names are case-sensitive, so "CD11b" and "CD11B" will not match.

Hope this is helpful, and let me know if you have further questions :)

Best regards, Søren

martuskaR commented 4 days ago

Hello Søren,

Thanks for getting back and helping out with this, it's much appreciated. I have fixed the naming, turned off the clean_colnames. I am trying two different methods, they both fail at the transformation stage. Is my data frame supposed to contain columns sample, batch, condition? Are these that cause the error, or am I doing something wrong?

#Convert flowset to tibble
df <- convert_flowset(
+   flowset = flowset,
+   metadata = metadata,
+   filename_col = "Filename",
+   sample_ids = "Patient_id", # By default the filename is used to get sample ids
+   batch_ids = "batch",
+   condition = "condition",
+   down_sample = TRUE,
+   sample_size = 2000000,
+   seed = 101,
+   panel = panel2, # Can also be the filename. It is solely used to ensure the channel names match what you expect (i.e. what is in the panel_antigen column)
+   panel_channel = "Channel",
+   panel_antigen = "Antigen", clean_colnames = F
+ )

Down sampling to 2e+06 cells Extracting expression data.. Your flowset is now converted into a dataframe.

# Transform data - This function also de-randomizes the data
uncorrected <- transform_asinh(
+   df = df,
+   markers = markers,
+   cofactor = 5,
+   .keep = TRUE # Lets you keep all columns, in case they are useful to you
+ )

Error in transformasinh(df = df, markers = markers, cofactor = 5, .keep = TRUE) : Not all given markers are in the data. Check if the markers contain a or -: B2M, CCR2, CD10, CD11B, CD11C, CD123, CD138, CD14, CD141, CD16, CD163, CD164, CD169, CD172a, CD19, CD1C, CD2, CD20, CD206, CD24 , CD27, CD274, CD33, CD34, CD36, CD371, CD38, CD40, CD43, CD45, CD5, CD61, CD64, CD66B, CD7, CD71, CD86, CD9, CLA, CD3CR1, HLA-DR, IgD, Ki67, MRP-14 Columns: id, CD11B, CD20, CD71, CD14, CD61, IgD, CD141, CD1C, CD206, CD138, CD169, Ki67, CD10, CD123, CCR2, CD27, CD3CR1, CD33, CD274, CD86, CD9, CD24 , CD7, CD172a, CD38, CD5, CD64, CD16, CD43, CD36, B2M, CD11C, CD34, CD66B, CD40, CD2, CD163, CD45, CD371, CD164, HLA-DR, MRP-14, CLA, sample, batch, condition

dfci <- prepare_data(data_dir = data_dir,
+                      metadata = metadata,
+                      filename_col = "Filename",
+                      batch_ids = "batch",
+                      condition = "condition",
+                      sample_ids = "Patient_id",
+                      markers = markers,
+                      derand = TRUE,
+                      cofactor = 5,
+                      down_sample = FALSE, .keep = FALSE,
+                      clean_colnames = FALSE,
+                      panel = panel2,
+                      panel_channel = "Channel",
+                      panel_antigen = "Antigen",
+                      transform = FALSE)

Reading 231 files to a flowSet.. Extracting expression data.. Your flowset is now converted into a dataframe. Done!

uncorrected <- transform_asinh(
+   df = dfci,
+   markers = markers,
+   cofactor = 5,
+   .keep = TRUE # Lets you keep all columns, in case they are useful to you
+ )

Error in transformasinh(df = dfci, markers = markers, cofactor = 5, .keep = TRUE) : Not all given markers are in the data. Check if the markers contain a or -: B2M, CCR2, CD10, CD11B, CD11C, CD123, CD138, CD14, CD141, CD16, CD163, CD164, CD169, CD172a, CD19, CD1C, CD2, CD20, CD206, CD24 , CD27, CD274, CD33, CD34, CD36, CD371, CD38, CD40, CD43, CD45, CD5, CD61, CD64, CD66B, CD7, CD71, CD86, CD9, CLA, CD3CR1, HLA-DR, IgD, Ki67, MRP-14 Columns: id, CD11B, CD20, CD71, CD14, CD61, IgD, CD141, CD1C, CD206, CD138, CD169, Ki67, CD10, CD123, CCR2, CD27, CD3CR1, CD33, CD274, CD86, CD9, CD24 , CD7, CD172a, CD38, CD5, CD64, CD16, CD43, CD36, B2M, CD11C, CD34, CD66B, CD40, CD2, CD163, CD45, CD371, CD164, HLA-DR, MRP-14, CLA, sample, batch, condition

Thank you, Kind regards,

shdam commented 4 days ago

Hi,

Happy to help :) Your marker list contains "CD19", but it is missing in the dataframe for some reason.

Batch, sample, and condition are added to the dataframe with batch_ids =, sample_ids =, and condition = in prepare_data().

Let me know if anything else comes up.

Best regards, Søren

martuskaR commented 4 days ago

You were right, there was a mismatch in channel name. I fixed it, but unfortunately, there is still an issue.

The transformation still fails when I don't specify the panel arguments. When I do specify them, the transformation works, but then the detect_batch_effect_express () fails.

Panel$Antigen is the same as markers, so why does the function fail when I don't specify the panel arguments?

 markers <- panel %>%
+   filter(Type != "none") %>%
+   pull(Antigen)
markers

[1] "B2M" "CD192" "CD10" "CD11B" "CD11C" "CD123" "CD138" "CD14" "CD141" "CD16" "CD163" "CD164" "CD169" "CD172a" [15] "CD19" "CD1C" "CD2" "CD20" "CD206" "CD24" "CD27" "CD274" "CD33" "CD34" "CD36" "CD371" "CD38" "CD40"
[29] "CD43" "CD45" "CD5" "CD61" "CD64" "CD66B" "CD7" "CD71" "CD86" "CD9" "CLA" "CX3CR1" "HLA-DR" "IgD"
[43] "Ki67" "MRP-14"

# Preparing the expression data
dfci <- prepare_data(data_dir = data_dir,
+                      metadata = metadata,
+                      filename_col = "Filename",
+                      batch_ids = "batch",
+                      condition = "condition",
+                      sample_ids = "Patient_id",
+                      markers = markers,
+                      down_sample = FALSE, .keep = FALSE,
+                      clean_colnames = FALSE,
+                      transform = TRUE)

Reading 231 files to a flowSet.. Extracting expression data.. Your flowset is now converted into a dataframe. Error in cyCombine::transform_asinh(., markers = markers, cofactor = cofactor, : NA markers %in% panel$Antigen [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [29] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

# Preparing the expression data
> dfci <- prepare_data(data_dir = data_dir,
+                      metadata = metadata,
+                      filename_col = "Filename",
+                      batch_ids = "batch",
+                      condition = "condition",
+                      sample_ids = "Patient_id",
+                      markers = markers,
+                      down_sample = FALSE, .keep = FALSE,
+                      clean_colnames = FALSE,
+                      panel = panel,
+                      panel_channel = "Channel",
+                      panel_antigen = "Antigen",
+                      transform = TRUE)

Reading 231 files to a flowSet.. Extracting expression data.. Your flowset is now converted into a dataframe. Transforming data using asinh with a cofactor of 5.. Done!

> setdiff(colnames(dfci), panel$Antigen ) [1] "id" "sample" "batch" "condition"

detect_batch_effect_express(dfci, downsample = 10000,seed = 101,
+                             out_dir = 'batch_effect_check')

Starting the quick(er) detection of batch effects. Downsampling to 10000 cells. Making distribution plots for all markers in each batch. Error in ggridges::geom_density_ridges(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error: ! object 'HLA' not found Backtrace:

  1. cyCombine::detect_batch_effect_express(...)
    1. cowplot::plot_grid(plotlist = p, nrow = round(length(all_markers)/6))
    2. cowplot::align_plots(...)
    3. base::lapply(...)
    4. cowplot (local) FUN(X[[i]], ...) ...
    5. ggplot2 (local) f(l = layers[[i]], d = data[[i]])
    6. l$compute_aesthetics(d, plot)
    7. ggplot2 (local) compute_aesthetics(..., self = self)
    8. base::lapply(aesthetics, eval_tidy, data = data, env = env)
    9. rlang (local) FUN(X[[i]], ...)
all_markers <- dfci %>% cyCombine::get_markers()
> all_markers
 [1] "CD11B"  "CD20"   "CD71"   "CD14"   "CD61"   "IgD"    "CD141"  "CD1C"   "CD206"  "CD138"  "CD169"  "Ki67"   "CD10"   "CD123"  "CD192" 
[16] "CD27"   "CX3CR1" "CD33"   "CD274"  "CD86"   "CD19"   "CD9"    "CD24"   "CD7"    "CD172a" "CD38"   "CD5"    "CD64"   "CD16"   "CD43"  
[31] "CD36"   "B2M"    "CD11C"  "CD34"   "CD66B"  "CD40"   "CD2"    "CD163"  "CD45"   "CD371"  "CD164"  "HLA-DR" "MRP-14" "CLA"   

Where am I going wrong now? Is the issue with HLA-DR naming? Sorry to be a pain, I am not able to figure this out.

Kind Regards,

shdam commented 4 days ago

Hi,

When you use the panel file, the column names are defined according to the panel you provide. If you don't, it uses the information embedded in the FCS files. If you look at the column names of the output when preparing without transformation or a panel file, you can see that they don't match the markers you give.

I will add it to the todo to make detect_batch_effects compatible with hyphens in marker names. For now, you are exactly correct, removing "-" from all marker names will fix the issue :) - I recommend doing that in the panel file and using that. Clearly, I could also work on some more useful error messages :D

Let me know if anything else pops up.

Best regards, Søren

martuskaR commented 4 days ago

[like] Marta Rzepkowska reacted to your message:


From: Søren Helweg Dam @.> Sent: Monday, November 11, 2024 3:12:31 PM To: biosurf/cyCombine @.> Cc: Marta Rzepkowska @.>; Author @.> Subject: Re: [biosurf/cyCombine] Error in prepare_data() (Issue #62)

Hi,

When you use the panel file, the column names are defined according to the panel you provide. If you don't, it uses the information embedded in the FCS files. If you look at the column names of the output when preparing without transformation or a panel file, you can see that they don't match the markers you give.

I will add it to the todo to make detect_batch_effects compatible with hyphens in marker names. For now, you are exactly correct, removing "-" from all marker names will fix the issue :) - I recommend doing that in the panel file and using that. Clearly, I could also work on some more useful error messages :D

Let me know if anything else pops up.

Best regards, Søren

— Reply to this email directly, view it on GitHubhttps://github.com/biosurf/cyCombine/issues/62#issuecomment-2468408107, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5DQJ3GBBEWUNEFL5HUFRSL2ADCN7AVCNFSM6AAAAABRN4P53KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRYGQYDQMJQG4. You are receiving this because you authored the thread.Message ID: @.***>