CUHIMSR / CytofBatchAdjust

GNU General Public License v3.0
2 stars 7 forks source link

Reference sample for batch correction #1

Open AlexBouz opened 4 years ago

AlexBouz commented 4 years ago

Dear @CUHIMSR ,

Thank you for sharing this important tool for batch effect correction. I had a question regarding the reference sample named "anchor", i downloaded the fcs files from the vaccination study uploaded on flow repository, and tried to run the R script using the examples terminologies:

[1] 2019-10-28 14:52:28 BatchAdjust.R [1] basedir:C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction [1] outdir:C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction/debarcoded [1] channelsFile:ChannelsToAdjustexample.txt [1] **batchKeyword:Barcode [1] anchorKeyword:anchor stim [1] transformation:FALSE [1] method:95p [1] Reading channels file: ChannelsToAdjust_example.txt [1] Adjusting 39 channels. [1] Batch 1 anchor not found. [1] Check file naming requirements? Error in BatchAdjust(basedir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction", : The reference anchor must exist and have the batch number 1. In addition: Warning message: In system(ls_cmd, intern = TRUE, ignore.stdout = FALSE, ignore.stderr = TRUE, : running command 'ls -1 C:/Users/Alexandre.Bouzekri/Desktop/Clinical\ files\ mass\ cytometry/Batch\ correction/anchor\ stim.fcs' had status 2** Called from: BatchAdjust(basedir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction", outdir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction/debarcoded", channelsFile = "ChannelsToAdjustexample.txt", batchKeyword = "Barcode", anchorKeyword = "anchor stim", method = "95p", transformation = FALSE, addExt = NULL, plotDiagnostics = TRUE)

I had this error about the reference sample, is it coming from some misslabeling requirements?

Thanks for the clarification.

RonSchuyler commented 4 years ago

What operating system are you using?

On Mon, Oct 28, 2019 at 1:01 PM AlexBouz notifications@github.com wrote:

Dear @CUHIMSR https://github.com/CUHIMSR ,

Thank you for sharing this important tool for batch effect correction. I had a question regarding the reference sample named "anchor", i downloaded the fcs files from the vaccination study uploaded on flow repository, and tried to run the R script using the examples terminologies:

[1] 2019-10-28 14:52:28 BatchAdjust.R [1] basedir:C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction [1] outdir:C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction/debarcoded [1] channelsFile:ChannelsToAdjustexample.txt [1] *batchKeyword:Barcode [1] anchorKeyword:anchor stim* [1] transformation:FALSE [1] method:95p [1] Reading channels file: ChannelsToAdjust_example.txt [1] Adjusting 39 channels. [1] Batch 1 anchor not found. [1] Check file naming requirements? Error in BatchAdjust(basedir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction", :

The reference anchor must exist and have the batch number 1. In addition: Warning message: In system(ls_cmd, intern = TRUE, ignore.stdout = FALSE, ignore.stderr = TRUE, : running command 'ls -1 C:/Users/Alexandre.Bouzekri/Desktop/Clinical\ files\ mass\ cytometry/Batch\ correction/anchor\ stim.fcs' had status 2 Called from: BatchAdjust(basedir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction", outdir = "C:/Users/Alexandre.Bouzekri/Desktop/Clinical files mass cytometry/Batch correction/debarcoded", channelsFile = "ChannelsToAdjustexample.txt", batchKeyword = "Barcode", anchorKeyword = "anchor stim", method = "95p", transformation = FALSE, addExt = NULL, plotDiagnostics = TRUE)

I had this error about the reference sample, is it coming from some misslabeling requirements?

Thanks for the clarification.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CUHIMSR/CytofBatchAdjust/issues/1?email_source=notifications&email_token=ABFVYFP3MKHOZ656ESCMLD3QQ4ZJRA5CNFSM4JF6TLJ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HU3LGOA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFVYFINGVJ5PE3RUYLG4RDQQ4ZJRANCNFSM4JF6TLJQ .

AlexBouz commented 4 years ago

Dear @RonSchuyler thank you for replying, I'm running the script you wrote on Windows 10 Pro, my version of R is 3.6.1 and R studio Version 1.2.5001, if i'm correct the fcs files from the flow repository were already debarcoded and correspond to the anchor samples from the study. I'm not sure if this is related to the error i have though.

RonSchuyler commented 4 years ago

The code currently will run only on linux or mac OS. This is due to path naming, and would be straight forward to extend for Windows, but I don't have access to a system I could use for testing. I suggest running on linux or mac, or digging into the code to handle parsing of directory paths for Windows.

On Mon, Oct 28, 2019 at 2:24 PM AlexBouz notifications@github.com wrote:

Dear @RonSchuyler https://github.com/RonSchuyler thank you for replying, I'm running the script you wrote on Windows 10 Pro, my version of R is 3.6.1 and R studio Version 1.2.5001, if i'm correct the fcs files from the flow repository were already debarcoded and correspond to the anchor samples from the study. I'm not sure if this is related to the error i have though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUHIMSR/CytofBatchAdjust/issues/1?email_source=notifications&email_token=ABFVYFJDXJOBC7G5AVEETXDQQ5DBXA5CNFSM4JF6TLJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOIWNY#issuecomment-547130167, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFVYFLD6U3T2CVCMNVSXA3QQ5DBXANCNFSM4JF6TLJQ .

AlexBouz commented 4 years ago

Dear @RonSchuyler , yes i missed the line where is the code is linux or Mac compatible only. I tried to run one example on a remote machine, the path naming looked fined however this batch reference 1 still came up when running the script. I named 4 files from the flow repository using the first example you described CytofBatchAdjust on a Mac

AlexBouz commented 4 years ago

After renaming of the example files as Barcode set 1 there was some progress: File list Screenshot Mac screenshot

RonSchuyler commented 4 years ago

Alex, The line "batchestToAdjust: 1" indicates that you only have batch 1, and no other batches to adjust relative to #1, so there is nothing for the software to do. All batches must be named according to your pattern. Hope that helps.

On Tue, Oct 29, 2019 at 11:48 AM AlexBouz notifications@github.com wrote:

After renaming of the example files as Barcode set 1 there was some progress: [image: File list Screenshot] https://user-images.githubusercontent.com/37419030/67794205-aa200c00-fa52-11e9-98ae-0d3bfa34d95d.png [image: Mac screenshot] https://user-images.githubusercontent.com/37419030/67794220-b0ae8380-fa52-11e9-86e0-abb277cc868b.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUHIMSR/CytofBatchAdjust/issues/1?email_source=notifications&email_token=ABFVYFPPDPPAIJ6QRB2ZSD3QRBZPTA5CNFSM4JF6TLJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECRO3SY#issuecomment-547548619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFVYFLS7O3632MYU7WM3PDQRBZPTANCNFSM4JF6TLJQ .

AlexBouz commented 4 years ago

Hi @RonSchuyler, I changed the file naming to list a bunch of anchor samples belonging to different batches, they load sequentially. However the TRUE/FALSE argument error still occurs for the Median Threshold. Multiple batches

SamGG commented 4 years ago

Hi @AlexBouz We ran recently cytofBatchAdjust in one shot on 27 batch of 20 samples each (7 GB of disk IIRC). The process stops during the permutation plot, but all FCS were normalized within less than 10 minutes. We did the plots separately afterwards. Could you show the files list and the batchAdjustCommand? Best

AlexBouz commented 4 years ago

Hi @SamGG , @RonSchuyler i run the BatchAdjust.R with the changes you did on two batches of two files each as a small sampling size, the code generated the normalized files, a folder of distribution plots, i attached the log file for info. I will give a shot wit bigger datasets, this time the code was run on a Windows 10 OS. LOG_BatchAdjust.2020.04.05.103107.txt

Thanks

Chelysheva commented 4 years ago

Hi @AlexBouz I have seen your issue from 31 Oct 2019. Obviously, you've managed to overcome it. How did you solve your TRUE/FALSE argument error for the Median Threshold? I am attaching my screenshot, where I try to run a small test subset of 2 batches with several samples in each. Many thanks in advance!

@CUHIMSR and @RonSchuyler Maybe you could help me with that issue? (I am using Linux, R version 3.6.2) Would be grateful.

Screenshot from 2020-04-09 14-41-51

Best

SamGG commented 4 years ago

Hi, I think your Bi209Di channel is always zero. I would start with a few channels with clear signal (CD4, CD3, CD8, CD19, CD20...) to get confident in the use of cytofBatchAdjust. Best.

ghost commented 3 years ago

Hi @RonSchuyler @AlexBouz , How to solve the TRUE/FALSE argument error? It says : Error in if (FractionZerosPooled[[acol]] > maxFrac0ForMedianThreshold) { : missing value where TRUE/FALSE needed I have attached a screenshot of the same

Screenshot from 2021-07-27 17-37-18

SamGG commented 3 years ago

Hi, You didn't mention me, but here are some checks. Did you set nz_only argument to TRUE? Did you careful select channels with signal? Sometimes some channels are opened on CyTOF but no labelling matched them. In such case, there are too many zeroes. Did this error occurs at the first channel? You should uncomment line 407, the one just before the error, so you could report some interesting diagnostic values. I think Ron will answer you. Best.

SamGG commented 3 years ago

nz_only should be set to FALSE, as in the default value. There is no reason to remove zero from the quantile computation. "Browse[1]>" means that when you got an error the code browser is opened instead of stopping. It allows to browse the environment, ie the variables. It could help understanding. Locate line 407 using the original https://github.com/CUHIMSR/CytofBatchAdjust/blob/f66c532141a12244059fc1db3b3e94616d88ed7e/BatchAdjust.R#L406-L408

ghost commented 3 years ago

Screenshot from 2021-07-27 20-35-53 Hello, Kindly ignore my previous email.I uncommented line 407. I set nz_only argument to TRUE.It shows the same error.I also checked channel names. This is what I got.I do not understand it.

Thank you

On Tue, Jul 27, 2021 at 8:27 PM Rashmi Rao @.***> wrote:

Hello, Thank you so much for your response. I set nz_only argument to TRUE.It shows the same error.I also checked channel names. How do I find out if it is at the first channel? What is "Browse[1]> "? I did not understand what you meant by uncomment line 407.

Thank you once again

On Tue, Jul 27, 2021 at 8:02 PM Samuel Granjeaud @.***> wrote:

Hi, You didn't mention me, but here are some checks. Did you set nz_only argument to TRUE? Did you careful select channels with signal? Sometimes some channels are opened on CyTOF but no labelling matched them. In such case, there are too many zeroes. Did this error occurs at the first channel? You should uncomment line 407, the one just before the error, so you could report some interesting diagnostic values. I think Ron will answer you. Best.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CUHIMSR/CytofBatchAdjust/issues/1#issuecomment-887759380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AON3B57GLZDH3BXXOZROIZTTZ37D3ANCNFSM4JF6TLJQ .

SamGG commented 3 years ago

The "0 of 0" tells that there is no zero value in all the values of the channel 319 (acol), but this channel has 0 value (the last zero). So I think the FCS may have not been read correctly. I recommend setting nz_only = FALSE and transformation = FALSE. If this still fail, try to find a R fluent person.

ghost commented 3 years ago

@SamGG Thank you for the suggestion.I converted a fcs file to csv and noticed that channel names did not match.It's working now.However,I have 2 queries. (1)In the variance graph for TNF and CD4 it shows an increase in variance post correction.Should I leave channels for TNF and CD4 out of the channel list ? (2)I do not understand what the total variance with frequency graph is indicating.I have used method as 80p.

Thank you

PrePostVariance

RonSchuyler commented 3 years ago

When variance increases as with your CD4 channel, I suggest inspecting the distribution plots pre- and post-adjustment to be sure that:

  1. the data from that channel look as expected before adjustment. Channels can fail for various reasons.
  2. the post-adjustment plot is not highly skewed or otherwise distorted.

For most data, we saw better results using 95p than with 80th percentile. I would try this.

However, your change in variance plot looks very sparse. Does your dataset contain a very small number of observations? Please see the paper for a better explanation of the variance plot than I can reproduce here.

On Tue, Aug 3, 2021 at 9:28 AM Rashmi-pixel @.***> wrote:

@SamGG https://github.com/SamGG Thank you for the suggestion.I converted a fcs file to csv and noticed that channel names did not match.It's working now.However,I have 2 queries. (1)In the variance graph for TNF and CD4 it shows an increase in variance post correction.Should I leave channels for TNF and CD4 out of the channel list ? (2)I do not understand what the total variance with frequency graph is indicating.I have used method as 80p.

Thank you

[image: PrePostVariance] https://user-images.githubusercontent.com/60535031/128041728-64d3c1f3-1ad8-489e-837d-0cd87ffc6246.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CUHIMSR/CytofBatchAdjust/issues/1#issuecomment-891942848, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFVYFMDKINDZDYQJ6I6AITT3ADKZANCNFSM4JF6TLJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

SamGG commented 3 years ago

Just to second Ron's default choice. We also defined 0.95 as the default percentile.

ghost commented 3 years ago

@SamGG The data from CD4 is shifted to the right. But using .95 percentile is increasing the variance post adjustment for more markers in my data. Yes fewer observations as it is flow cytometry dataset.It contains approx a fifth of the data used in the paper.

I understand that the third plot shows significance of the decrease in total variance,obtained as a result of permutation test. The paper states "the labels pre- and post-adjustment were swapped for each replicate".I not clear what labels mean here.

Also a completely different issue - "Some data values of '405 660_20-A' channel exceed its $PnR value 8289 and will be truncated!" where do i set truncate_max_range = FALSE?