fw262 / TAR-scRNA-seq

scRNA-seq analysis beyond gene annotations using transcriptionally active regions (TARs) generated from sequence alignment data
GNU General Public License v3.0
9 stars 7 forks source link

Error creating DigitalExpression and previous bam files are returned empty #9

Open Prakrithi-P opened 3 years ago

Prakrithi-P commented 3 years ago

Hii While running the Snakemake pipeline, the script stops at INFO 2021-09-12 23:40:09 BarcodeListRetrieval Looking for the top 5000 cell barcodes INFO 2021-09-12 23:40:09 BarcodeListRetrieval Selected 0 core barcodes ERROR 2021-09-12 23:40:09 DigitalExpression Running digital expression without somehow selecting a set of barcodes to process no longer supported.

And all the bam files generated in the previous steps are empty. I don't understand why or where I've been going wrong. Can I get help please?

fw262 commented 3 years ago

Hi Prakrithi,

Is the file "TAR_reads.bed.gz.withDir.refFlat.refFlat" empty as well? Have you tried running the pipeline on the test dataset provided? It would also be helpful to share the snakemake log output.

Thanks, Michael

Prakrithi-P commented 3 years ago

Hu Michael, Thanks for the reply. TAR_reads.bed.gz.withDir.refFlat.refFlat is not empty . Please find the log file attached. I am testing the pipeline on a pbmc dataset from 10X chromium.

2021-09-23T180832.027414.snakemake.log .

Thanks, Prakrithi

Prakrithi-P commented 3 years ago

Also find the log file for the test dataset provided. chicken_test_dataset_out.log

fw262 commented 3 years ago

Hi Prakrithi,

It looks like you're getting errors with the "MergeBamAlignment" from Picard tools and "DigitalExpression" from Dropseq.

Can you re-run the snakemake commands with &> snakemakeLog.txt appended to the commands and send me the "snakemakeLog.txt" files to better diagnose? (i.e. run the command snakemake -R --until getMats -j [# cores] &> snakemakeLog.txt)

Best, Michael

Prakrithi-P commented 3 years ago

Hello Michael, Please find the snakemake log file like you have asked for. This is for the pbmc dataset. For the chicken dataset, I noticed the error in MergeBamAlignment is due to the mismatching contigs in the bam and in the reference fasta. I am trying to solve that.

Thanks, Prakrithi

snakemakeLog.txt

fw262 commented 3 years ago

Can you make sure you have Drop-seq Computational Tools v2.3.0 installed? I know that in the earlier versions of the Drop-seq tools (i.e. < v2), it looks for a different CELL_BARCODE_TAG (XC) and MOLECULAR_BARCODE_TAG=(XM).

Can you take a look at the "/nfs_node3/prakrithi/test/TARs/results_human_pbmc/pbmc4k_S1_L002/pbmc4k_S1_L002_gene_exon_tagged.bam" file and see what the tags are? Are XC and XM tagged for all the reads? That should give us a better idea about the error.

Thanks, Michael

Prakrithi-P commented 3 years ago

I am using DropSeq v2.4 and yes, all the reads are tagged with XC and XM.

Also, can I have help with aggregating the counts generated from feature counts (for SMART-Seq2 and an RNA-Seq dataset which I have been trying). How can I do that? I am just gettin started with scRNA seq and I am not sure how to do it.

Thanks, Prakrithi

Prakrithi-P commented 3 years ago

And I am not getting any of the above issues with 10X V3 chemistry. Just with v2 datasets.

fw262 commented 3 years ago

Ah, I believe that is the root of your problem. For the v2 datasets, please go into the config.yaml file and change the last UMI_range value to 26 instead of the default 28 to account for the difference in UMI length.

Aggregating the counts from SMART-Seq2 should be straightforward. After generating counts from featureCounts, you can simply concatenate the results into an expression matrix for each sample. Then, you can just run single-cell analysis on that aggregated expression matrix. Let me know if you're encountering specific issues with aggregating.

Best, Michael

Prakrithi-P commented 3 years ago

Thanks Michael, I'll try aggregating. I changed the UMI value as 17-26 - the first thing. The errors I posted persist after that.

fw262 commented 3 years ago

A few more things to try and diagnose: -Can you share the first 10 lines of /nfs_node3/prakrithi/test/TARs/results_human_pbmc/pbmc4k_S1_L002/pbmc4k_S1_L002_gene_exon_tagged.bam using samtools view? There is likely something wrong with the gene tagging in the bam file. -Try changing the expectedCells value to something like 1000 or 500 in config.yaml -Can you point me to the 10X pbmc v2 dataset you are trying to run? If it's a public dataset, I can try running it from my end to diagnose.

Michael

Prakrithi-P commented 3 years ago

PFA /nfs_node3/prakrithi/test/TARs/results_human_pbmc/pbmc4k_S1_L002/pbmc4k_S1_L002_gene_exon_tagged.bam image

Link to the dataset https://www.10xgenomics.com/resources/datasets/4-k-pbm-cs-from-a-healthy-donor-2-standard-2-1-0

fw262 commented 3 years ago

Great, I'll take a look to see if I replicate your error with that dataset.

Also, have you tried running using the "from_cellranger" pipeline?

Michael