Closed denvercal1234GitHub closed 1 year ago
I would use write.files
instead of CSV-to-FCS. We haven't updated CSV-to-FCS for a while.
I'm not familiar with how PeacoQC works, but if it takes FCS files, then the output of write.files
should work. Give it a try and let us know if it doesn't work (what the error message is).
Thanks @ghar1821 for your response! When you get a chance, would you mind helping me address 4 questions below?
To confirm, if I do write.files()
to export the channel values as FCS without doing any transformation or clustering, the transformation done by FlowJo will be preserved?
When I ran write.files()
as below on my data.list
file (i.e. before merging all the data.tables into 1 file), I got a Warning below. That sounds concerning, so I decided to write.files()
instead on my cell.dat
object (i.e., after merging) and there was no error. Is it ok to use FCS convered from the cell.dat
object?
Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3 has 98770 rows but longest item has 136260; recycled with remainder.Warning: Item 4 has 99557 rows but longest item has 136260; recycled with remainder.Warning: Item 5 has 86865 rows but longest item has 136260; recycled with remainder.Warning: Item 6 has 100794 rows but longest item has 136260; recycled with remainder.Warning: Item 7 has 80890 rows but longest item has 136260; recycled with remainder.Warning: Item 8 has 80149 rows but longest item has 136260; recycled with remainder.Warning: Item 9 has 20444 rows but longest item has 136260; recycled with remainder.Warning: Item 10 has 92751 rows but longest item has 136260; recycled with remainder.Warning: Item 11 has 78070 rows but longest item has 136260; recycled with remainder.Warning: Item 12 has 90425 rows but longest item has 136260; recycled with remainder.Warning: Item 13 has 84116 rows but longest item has 136260;
read.flowSet()
as shown below, I had another warning as below. This does not look right?##fcs.dir1 is my directory of the FCS files converted from cell.dat object
cell.dat_FlowSet <- read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE)
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308.
Thank you for your help!
> write.files(F37_LiveSinglets_channel_data.list,
+ write.csv = FALSE,
+ write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
Hi @denvercal1234GitHub
Answers to your questions:
write.files
essentially just writes out whatever is stored in the data.table
.data.list
looks to be a list of data.table
? The write.files
function takes a data.table
as the dat
argument, not a list. So you should use lapply
(or something equivalent) to write out every data.table
in data.list
variable out as FCS file.Rather than writing out the flowJo's transformed value out as csv file then converting it to fcs file using Spectre, is it not possible to just get flowJo to export those transformed values as fcs files?
Thanks @ghar1821 and @tomashhurst.
QUESTION 2.
data.list
is the output in the format of a list of data.tables from Spectre::read.files(file.loc = "/Users/clusteredatom/Documents/.../LiveSinglets_Channel", file.type = ".csv", do.embed.file.names = TRUE)
Performing lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL))
still gave me 1 FCS file called channelFCS.fcs
, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.
$export_F37_10D_CD158_PE_Live
NULL
$export_F37_10D_KI_PE_Live
NULL
$export_F37_10D_CD12_PE_Live
NULL
...
And, if I run write.files()
with divide.by = "FileNo"
, then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"
write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
If I run write.files() nested in the lapply()
instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:
F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)
lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A`
NULL
$`SSC-A`
NULL
$TOX
NULL
$CD3
NULL
...
Lastly, if thus run simply write.files
using my cell.data object with divide.by = "FileNo"
, I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.
If I set the divide.by=NULL
, I will only get 1 FCS file.
F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)
write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
In conclusion, would you say running write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
is the best option?
Although now, the file name of the outputted FCS will just be FileNo
that I will need to match them back to the FileNo
column of the cell.data
object - which create more room for human errors - which was why I had QUESTION 4.
I am ok with this, but the issue is when I then used these exported FCS files in flowCore::read.flowSet()
to convert them into FlowSet
object in order to use the peacoQC package
, I had Warning as mentioned in QUESTION 3 of $PnR NA is invalid
.
# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat)
F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.
QUESTION 3.
The warning $PnR NA is invalid
as mentioned above after running flowCore::read.flowSet()
was also produced after running flowCore::read.FCS()
, which is being discussed here: https://github.com/RGLab/flowCore/issues/243
@ghar1821 commenting at https://github.com/RGLab/flowCore/issues/243
Pasting the reply I initially wrote for https://github.com/RGLab/flowCore/issues/243 here:
Performing
lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL))
still gave me 1 FCS file calledchannelFCS.fcs
, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.$export_F37_10D_CD158_PE_Live NULL $export_F37_10D_KI_PE_Live NULL $export_F37_10D_CD12_PE_Live NULL ...
The NULL showed in the console is because write.files
function does not return anything. Hence the console just shows NULL is returned for every element in the list.
You only get 1 FCS file because every data.table is written into the same FCS file (the one with channelFCS prefixed). You should change the file.prefix
parameter so that every data.table is given a different file.prefix
. So something like the following where I have a list of data.table, each element is named after the filename the data.table was read from:
lapply(seq_len(length(dt)), function(i) {
x <- dt[[i]]
fname <- gsub(".csv", "", names(dt)[i])
write.files(x, file.prefix = fname, write.csv=FALSE, write.fcs=TRUE)
})
And, if I run
write.files()
withdivide.by = "FileNo"
, then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
write.files
should not be able to write out a list. I'm surprised you didn't get an error. Is F37_LiveSinglets_channel_data.list
a data.table? The warning may mean some of the markers have missing values for some cells. Looks to me marker in column 1 has 98,073 cells while the one in column 2 has 103,077, and that the marker with the most number of non-empty values has 136,260 cells. Can you check?
If I run write.files() nested in the
lapply()
instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list) lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A` NULL $`SSC-A` NULL $TOX NULL $CD3 NULL ...
No this doesn't make sense. Don't do it.
Lastly, if thus run simply
write.files
using my cell.data object withdivide.by = "FileNo"
, I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.If I set the
divide.by=NULL
, I will only get 1 FCS file.F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list) write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
This is the expected behaviour. If divide.by
is not NULL, write.files
will simply break up your data.table into few chunks, each containing only cells which value for the column you specified in divide.by
is the same, e.g. chunk 1 only contains cells from FileNo 1, chunk 2 only contains cells from FileNo 2, etc..
In conclusion, would you say running
write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")
is the best option?
Yes.
Although now, the file name of the outputted FCS will just be
FileNo
that I will need to match them back to theFileNo
column of thecell.data
object - which create more room for human errors - which was why I had QUESTION 4.
What about using FileName as divide.by
?
I am ok with this, but the issue is when I then used these exported FCS files in
flowCore::read.flowSet()
to convert them intoFlowSet
object in order to use thepeacoQC package
, I had Warning as mentioned in QUESTION 3 of$PnR NA is invalid
.# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat) F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.
QUESTION 3. The warning
$PnR NA is invalid
as mentioned above after runningflowCore::read.flowSet()
was also produced after runningflowCore::read.FCS()
, which is being discussed here: RGLab/flowCore#243
I think you will still get the error regardless of whether you use 1 or 4 FCS files. I have a feeling this is because the data.table does not store the instrument range - though I'm not 100% as I don't normally use FCS file. Now, whether this matters for downstream analysis really depends on what your subsequent steps are.
I'll see if I can find the time write up a guide on how to use peacoQC to do preprocessing, and Spectre for subsequent downstream analysis.
From @denvercal1234GitHub reply originally posted at https://github.com/RGLab/flowCore/issues/243:
Thank you so much for your time @ghar1821!
I will try `divide.by="FileName" and let you know, and check on QUESTION 3 and let you know.
And, yes please, having a guide on how to incorporate the following packages to our
Spectre
package would be very useful and increase the packages' usage a lot:
infinityFlow
package (https://github.com/ebecht/infinityFlow ; ebecht) upstream of Spectre.peacoQC
(https://github.com/saeyslab/PeacoQC , AnneliesEmmaneel)flowVS
to decide optimal cofactors per channel - which at the moment, here is how I do it, incorporating the steps by Modified workflow to use flowVS-transformed FCS in Spectre clustering workflow? HdBraanker/Spectral_Flow_Workflow#3 by HdBraanker- Your tutorial (Interoperability: How to convert Spectre data.table to object compatible with CATALYST package? ImmuneDynamics/Spectre#92) on how to convert data.table to SCE object would also be useful for people wanting to use some features of the
catalyst
package (https://github.com/HelenaLC/CATALYST , HelenaLC).
The issue was identified due to file naming of PE markers across wells. See https://github.com/RGLab/flowCore/issues/243
Solution = change the marker names before exporting the data from cytometer into FlowJo and subsequent R packages.
Thank you again @ghar1821 for taking the time to look into the issue. And, I look forward to reading the tutorials whenever you get a chance to do it.
Keeping it open because unsure if Givanna's tag for documentation would disappear if closed.
Hi there,
I am following the tutorial (https://immunedynamics.io/spectre/simple-discovery/#6_Annotate_clusters).
I input the channel values of my data into Spectre. Then, I do
write.files()
as below to export these channel values as FCS without doing any transformation or clustering, because I now want to use these FlowJo-transformed FCS files in peacoQC package before putting back the QC-files to Spectre.Is it better to use CSV-to-FCS script instead of this
write.files()
?Thank you for your help.
Related: https://github.com/sydneycytometry/CSV-to-FCS/issues/6