ImmuneDynamics / Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
https://immunedynamics.github.io/spectre/
MIT License
56 stars 21 forks source link

How to best use peacoQC() in Spectre workflow? #149

Closed denvercal1234GitHub closed 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

I am following the tutorial (https://immunedynamics.io/spectre/simple-discovery/#6_Annotate_clusters).

I input the channel values of my data into Spectre. Then, I do write.files() as below to export these channel values as FCS without doing any transformation or clustering, because I now want to use these FlowJo-transformed FCS files in peacoQC package before putting back the QC-files to Spectre.

Is it better to use CSV-to-FCS script instead of this write.files()?

Thank you for your help.

write.files(cell.dat,
                    file.prefix = exp.name,
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)

Related: https://github.com/sydneycytometry/CSV-to-FCS/issues/6

ghar1821 commented 1 year ago

I would use write.files instead of CSV-to-FCS. We haven't updated CSV-to-FCS for a while.

I'm not familiar with how PeacoQC works, but if it takes FCS files, then the output of write.files should work. Give it a try and let us know if it doesn't work (what the error message is).

denvercal1234GitHub commented 1 year ago

Thanks @ghar1821 for your response! When you get a chance, would you mind helping me address 4 questions below?

  1. To confirm, if I do write.files() to export the channel values as FCS without doing any transformation or clustering, the transformation done by FlowJo will be preserved?

  2. When I ran write.files() as below on my data.list file (i.e. before merging all the data.tables into 1 file), I got a Warning below. That sounds concerning, so I decided to write.files() instead on my cell.dat object (i.e., after merging) and there was no error. Is it ok to use FCS convered from the cell.dat object?

Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3 has 98770 rows but longest item has 136260; recycled with remainder.Warning: Item 4 has 99557 rows but longest item has 136260; recycled with remainder.Warning: Item 5 has 86865 rows but longest item has 136260; recycled with remainder.Warning: Item 6 has 100794 rows but longest item has 136260; recycled with remainder.Warning: Item 7 has 80890 rows but longest item has 136260; recycled with remainder.Warning: Item 8 has 80149 rows but longest item has 136260; recycled with remainder.Warning: Item 9 has 20444 rows but longest item has 136260; recycled with remainder.Warning: Item 10 has 92751 rows but longest item has 136260; recycled with remainder.Warning: Item 11 has 78070 rows but longest item has 136260; recycled with remainder.Warning: Item 12 has 90425 rows but longest item has 136260; recycled with remainder.Warning: Item 13 has 84116 rows but longest item has 136260; 
  1. When I then ran read.flowSet() as shown below, I had another warning as below. This does not look right?
##fcs.dir1 is my directory of the FCS files converted from cell.dat object

cell.dat_FlowSet <- read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) 
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. 
  1. Do you know how I might preserve the name for each FCS file (e.g., by the column "FileName") that is converted?

Thank you for your help!

> write.files(F37_LiveSinglets_channel_data.list,
+                     write.csv = FALSE,
+                     write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
ghar1821 commented 1 year ago

Hi @denvercal1234GitHub

Answers to your questions:

  1. Yes the transformation will be preserved. The write.files essentially just writes out whatever is stored in the data.table.
  2. The data.list looks to be a list of data.table? The write.files function takes a data.table as the dat argument, not a list. So you should use lapply (or something equivalent) to write out every data.table in data.list variable out as FCS file.
  3. I'm not sure what that error message meant.
  4. I don't think that is possible with FCS files.

Rather than writing out the flowJo's transformed value out as csv file then converting it to fcs file using Spectre, is it not possible to just get flowJo to export those transformed values as fcs files?

denvercal1234GitHub commented 1 year ago

Thanks @ghar1821 and @tomashhurst.

QUESTION 2. data.list is the output in the format of a list of data.tables from Spectre::read.files(file.loc = "/Users/clusteredatom/Documents/.../LiveSinglets_Channel", file.type = ".csv", do.embed.file.names = TRUE)

Performing lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL)) still gave me 1 FCS file called channelFCS.fcs, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.

$export_F37_10D_CD158_PE_Live
NULL

$export_F37_10D_KI_PE_Live
NULL

$export_F37_10D_CD12_PE_Live
NULL
...

And, if I run write.files() with divide.by = "FileNo", then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"

write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

If I run write.files() nested in the lapply() instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A`
NULL

$`SSC-A`
NULL

$TOX
NULL

$CD3
NULL
...

Lastly, if thus run simply write.files using my cell.data object with divide.by = "FileNo", I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.

If I set the divide.by=NULL, I will only get 1 FCS file.

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

In conclusion, would you say running write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo") is the best option?

Although now, the file name of the outputted FCS will just be FileNo that I will need to match them back to the FileNo column of the cell.data object - which create more room for human errors - which was why I had QUESTION 4.

I am ok with this, but the issue is when I then used these exported FCS files in flowCore::read.flowSet() to convert them into FlowSet object in order to use the peacoQC package, I had Warning as mentioned in QUESTION 3 of $PnR NA is invalid.

# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat)

F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.

QUESTION 3. The warning $PnR NA is invalid as mentioned above after running flowCore::read.flowSet() was also produced after running flowCore::read.FCS(), which is being discussed here: https://github.com/RGLab/flowCore/issues/243

denvercal1234GitHub commented 1 year ago

@ghar1821 commenting at https://github.com/RGLab/flowCore/issues/243

ghar1821 commented 1 year ago

Pasting the reply I initially wrote for https://github.com/RGLab/flowCore/issues/243 here:

Performing lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL)) still gave me 1 FCS file called channelFCS.fcs, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.

$export_F37_10D_CD158_PE_Live
NULL

$export_F37_10D_KI_PE_Live
NULL

$export_F37_10D_CD12_PE_Live
NULL
...

The NULL showed in the console is because write.files function does not return anything. Hence the console just shows NULL is returned for every element in the list. You only get 1 FCS file because every data.table is written into the same FCS file (the one with channelFCS prefixed). You should change the file.prefix parameter so that every data.table is given a different file.prefix. So something like the following where I have a list of data.table, each element is named after the filename the data.table was read from:

lapply(seq_len(length(dt)), function(i) {
    x <- dt[[i]]
    fname <- gsub(".csv", "", names(dt)[i])
    write.files(x, file.prefix = fname, write.csv=FALSE, write.fcs=TRUE)
})

And, if I run write.files() with divide.by = "FileNo", then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"

write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

write.files should not be able to write out a list. I'm surprised you didn't get an error. Is F37_LiveSinglets_channel_data.list a data.table? The warning may mean some of the markers have missing values for some cells. Looks to me marker in column 1 has 98,073 cells while the one in column 2 has 103,077, and that the marker with the most number of non-empty values has 136,260 cells. Can you check?

If I run write.files() nested in the lapply() instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A`
NULL

$`SSC-A`
NULL

$TOX
NULL

$CD3
NULL
...

No this doesn't make sense. Don't do it.

Lastly, if thus run simply write.files using my cell.data object with divide.by = "FileNo", I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.

If I set the divide.by=NULL, I will only get 1 FCS file.

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

This is the expected behaviour. If divide.by is not NULL, write.files will simply break up your data.table into few chunks, each containing only cells which value for the column you specified in divide.by is the same, e.g. chunk 1 only contains cells from FileNo 1, chunk 2 only contains cells from FileNo 2, etc..

In conclusion, would you say running write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo") is the best option?

Yes.

Although now, the file name of the outputted FCS will just be FileNo that I will need to match them back to the FileNo column of the cell.data object - which create more room for human errors - which was why I had QUESTION 4.

What about using FileName as divide.by?

I am ok with this, but the issue is when I then used these exported FCS files in flowCore::read.flowSet() to convert them into FlowSet object in order to use the peacoQC package, I had Warning as mentioned in QUESTION 3 of $PnR NA is invalid.

# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat)

F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.

QUESTION 3. The warning $PnR NA is invalid as mentioned above after running flowCore::read.flowSet() was also produced after running flowCore::read.FCS(), which is being discussed here: RGLab/flowCore#243

I think you will still get the error regardless of whether you use 1 or 4 FCS files. I have a feeling this is because the data.table does not store the instrument range - though I'm not 100% as I don't normally use FCS file. Now, whether this matters for downstream analysis really depends on what your subsequent steps are.

I'll see if I can find the time write up a guide on how to use peacoQC to do preprocessing, and Spectre for subsequent downstream analysis.

ghar1821 commented 1 year ago

From @denvercal1234GitHub reply originally posted at https://github.com/RGLab/flowCore/issues/243:

Thank you so much for your time @ghar1821!

I will try `divide.by="FileName" and let you know, and check on QUESTION 3 and let you know.

And, yes please, having a guide on how to incorporate the following packages to our Spectre package would be very useful and increase the packages' usage a lot:

denvercal1234GitHub commented 1 year ago

The issue was identified due to file naming of PE markers across wells. See https://github.com/RGLab/flowCore/issues/243

Solution = change the marker names before exporting the data from cytometer into FlowJo and subsequent R packages.

Thank you again @ghar1821 for taking the time to look into the issue. And, I look forward to reading the tutorials whenever you get a chance to do it.

Keeping it open because unsure if Givanna's tag for documentation would disappear if closed.