cortes-ciriano-lab / SComatic

A tool for detecting somatic variants in single cell data
Other
145 stars 20 forks source link

No base count matrix(.tsv) files in Step 2 output but bam files of those cell types are there in Step1 Output. #31

Closed lipikakalson closed 9 months ago

lipikakalson commented 9 months ago

I am trying to detect somatic mutations from my single-cell RNA-seq data for lung cancer samples. Step 1 command used:

image

Output files: image Step 2:

image

Output:

image

I am not sure why there is no output files for all other cell types. Is there a specific reason that 10 out of 12 files are not there. But all the files are processed (attaching the ss of subset of output of Step2)

image

Note: I also tried using the Example data and its the same issue:

image

Only here, one file is missing.

RinconFer commented 9 months ago

Hi, I am finding the same problem here, The step 1 runs with no problems and I have the bams files, however the step 2 code is generating the temporary folders (I have removed in the code the rm -rf $temp line to check what was going on on those folders so I am keeping them, but they are empty).

However, it never adds anything inside the temp folders and the output file is not produced. It doesn't throw an error anywhere in the process either. image

lipikakalson commented 9 months ago

Yes, I tried it with other sample also, no file is produced now in Step 2 output even though it runs without any error. Also, all my temp files are empty.

Francesc-Muyas commented 9 months ago

Dear users, Could you check couple of things?

  1. Could you show the size of the bam files. For instance using ls -lh *bam.
  2. Could you paste here the output (*report.txt) of the SplitBamCellTypes.py ?
  3. Could you run the BaseCellCounter.py with the parameter --nprocs 1 and print the error?
  4. Finally, could you confirm that the reference genome version used in the SComatic computation is the same as the one used for the alignment (CellRanger)?

Thanks, Fran

lipikakalson commented 9 months ago
  1. image
  2. Sample report

    image
  3. I ran with nprocs=1 , and there is no error, it ran successfully and output of the same is attached in the issue also. image (I just printed the cell type extra here)

  4. Yes, the reference genome version is same as used in the Cellranger for alignment.

Francesc-Muyas commented 9 months ago

Hi, It seems that the first step (SplitBamCellTypes.py) filtered a huge number of reads (66M out of 70M) because of CBs not being found in the metadata file. As consequence, step2 files are empty because there are almost no PASS reads to analyse (see column Pass_reads in the report.txt file). The CBs provided in the medatada file do rarely match with the ones observed in the bam file.

Cheers, Fran

RinconFer commented 9 months ago

Hi, thank you for the help. I figure out the error. As you said it was a problem with the reads being filtered... in my case it was that I left the nM filter and my BAM file don't have it. Now, I am running the STEP 2 and looks like is working... I'll update with the results.

lipikakalson commented 9 months ago

Hi, It seems that the first step (SplitBamCellTypes.py) filtered a huge number of reads (66M out of 70M) because of CBs not being found in the metadata file. As consequence, step2 files are empty because there are almost no PASS reads to analyse (see column Pass_reads in the report.txt file). The CBs provided in the medatada file do rarely match with the ones observed in the bam file.

Cheers, Fran

I see the issue now, thank you for your reply and help. Is there any way I can see which CB are found, which are mismatched or not found?

Best, Lipika

Francesc-Muyas commented 9 months ago

Hi,

You could grep the CB columns in the bam file using this command samtools view scRNAseq.bam | grep -o 'CB:Z:[ACTGN]*-1' | sort | uniq > unique.cb.txt and then compare the output file with your metadata.

You might need to slightly change the previous command depending on how your CBs look like.

Thanks, Fran

lipikakalson commented 9 months ago

Hi, It seems that the first step (SplitBamCellTypes.py) filtered a huge number of reads (66M out of 70M) because of CBs not being found in the metadata file. As consequence, step2 files are empty because there are almost no PASS reads to analyse (see column Pass_reads in the report.txt file). The CBs provided in the medatada file do rarely match with the ones observed in the bam file.

Cheers, Fran

Hi Fran, I tried with my other sample, in this case there is almost 85% of passed reads, still there is no file in Step 2. I am attaching the ss here. Also I could not see the unique CB of these bam files.

image
Francesc-Muyas commented 9 months ago

Could you paste the command you used to run for instance the endothelial cells?

lipikakalson commented 9 months ago

Hi Fran, I used the same command for Endothelial cells. I managed to get past this error by using meta file from annotating raw feature matrix without doing any QC or any filtering on it instead of doing the annotation later, so that no reads are removed.

image

I am guessing that the 4 cell types did not have enough passed reads, that's why those 4 cell types are missing in Step 2. If I do Step 3 now, there is no file in this output even it completes without any error. image

Francesc-Muyas commented 9 months ago

Could you paste the command you used to run the step 3? Why does it say that there are 15 tsv files in this folder? There should be only the ones you want to be merged for this sample/individual.

RinconFer commented 9 months ago

Have you check the hidden files in your folder? (Ctrl + H). It happened to me that I forgot to define the Sample object in the terminal and the script generated a file that started with a period, which in Linux it makes the file hidden

lipikakalson commented 9 months ago

Yes, thanks, it worked now. It was hidden.

Have you check the hidden files in your folder? (Ctrl + H). It happened to me that I forgot to define the Sample object in the terminal and the script generated a file that started with a period, which in Linux it makes the file hidden

Could you paste the command you used to run the step 3? Why does it say that there are 15 tsv files in this folder? There should be only the ones you want to be merged for this sample/individual.

Yes, i was guessing 15 files are the unique cell types (base count matrix) tsv files, that will be merged into Step3.

I successfully ran the SComatic without any errors. Thankyou so much for all the help!

One more question, how to visualise these results graphically? (Sorry, I am very new to scRNA analysis :( ) Thanks in advance!

Francesc-Muyas commented 9 months ago

We used to perform the downstream analysis using R or Python depending on your priorities. However, this is not included in SComatic.

Cheers, Fran

lipikakalson commented 9 months ago

Thank you so much for your help :)