dbrg77 / scATAC_snakemake

Snakemake pipeline for plate scATAC-seq processing
26 stars 8 forks source link

inquiry about Loading scATAC-seq matrices into R #2

Open jiangzh-coder opened 2 years ago

jiangzh-coder commented 2 years ago

Hi

i want to analyze some scATAC-seq data. And after unzip, i got 10 folders (1 patient per folder). In folder, there are 2 subfolders in whcih scRNAseq data and scATAC-seq data exist seperately. Within these 2 subfolders, there are files generated by cell ranger ( i attached 2 pic.) How may i loading scATAC-seq matrices as well as scRNAseq data into R? Could you kindly provide some codes?

image

jiangzh-coder commented 2 years ago

jzhou@jiang:/mnt/d/##files/#atac/HD2$ tail atac_peaks.bed GL000218.1 83275 84106 KI270726.1 27131 28058 KI270726.1 41490 42368 KI270711.1 7979 8731 KI270711.1 8887 9380 KI270713.1 15800 16459 KI270713.1 17290 18022 KI270713.1 21445 22340 KI270713.1 32734 33486 KI270713.1 36862 37780

jzhou@jiang:/mnt/d/##files/#atac/HD2$ head atac_peak_annotation.tsv chrom start end gene distance peak_type chr1 9778 10667 MIR1302-2HG -18887 distal chr1 180732 181004 AL627309.5 -6871 distal chr1 181116 181809 AL627309.5 -7255 distal chr1 183935 184770 AL627309.5 -10074 distal chr1 191103 192028 AL627309.5 -17242 distal chr1 267627 268482 AP006222.2 773 distal chr1 629498 630379 AC114498.1 41870 distal chr1 633577 634503 AC114498.1 45949 distal chr1 778280 779196 LINC01409 0 promoter

jzhou@jiang:/mnt/d/##files/#atac/HD2$ tail -5 atac_fragments.tsv KI270713.1 39073 39449 TAAGTAGCACAGGATG-1 1 KI270713.1 39075 39208 GCTTAACAGTTCCCGT-1 2 KI270713.1 39697 39805 TGTTATGAGGGCTTTG-1 2 KI270713.1 40366 40677 GACCTAAGTTCCGGCT-1 2 KI270713.1 40639 40708 GGATGGCCACACCAAC-1 2

dbrg77 commented 2 years ago

Hi @jiangzh-coder

For RNA, I suppose you are using the 10X Genomics 3' kit, and filtered_feature_bc_matrix is generated by cellranger. You can use Seurat to load and analysis the data.

For ATAC, what method are you using? It seems you are missing the matrix.mtx file. Since you have the fragment file, maybe you can generate the matrix by yourself. I think Signac can do this.

jiangzh-coder commented 2 years ago

Hello,

i analyzed upstream data in this dataset (GSE199994) . i download SRA file from EBI, then i change then into fastq file, then i transform their names to standard name and run cellranger_atac. i got error as following:

2.7% (< 10%) of read pairs have a valid 10x barcode. This could be a result of poor sequencing quality, a sample mixup, or running the wrong pipeline, for example, running cellranger-atac on Multiome ATAC + GEX data, or vice versa.

The whole code is as following :

ascp -QT -l 300m -P33001 -i ~/miniconda3/envs/my10x/etc/asperaweb_id_dsa.openssh @.***:/vol1/srr/SRR186/006/SRR18613306 .

mv SRR18613306 SRR18613306.sra parallel-fastq-dump -t 12 -O ./ --split-files --gzip -s SRR18613306.sra

mv *.fastq.gz /home/ubuntu/GSE199994/scATAC/2.raw_fastq

mv SRR18613295_1.fastq.gz SRR18613295-P5_S9_L001_I1_001.fastq.gz mv SRR18613295_2.fastq.gz SRR18613295-P5_S9_L001_R1_001.fastq.gz mv SRR18613295_3.fastq.gz SRR18613295-P5_S9_L001_R2_001.fastq.gz mv SRR18613295_4.fastq.gz SRR18613295-P5_S9_L001_R3_001.fastq.gz

cellranger-atac count --id=SRR18613295-P5 \ --reference=/home/ubuntu/biosoftware/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --fastqs=/home/ubuntu/GSE199994/scATAC/2.raw_fastq \ --sample=SRR18613295-P5 \ --localcores=24 \ --localmem=96

could you help me?

best wishes Jiang

At 2022-11-12 16:38:18, "Xi Chen" @.***> wrote:

Hi @jiangzh-coder

For RNA, I suppose you are using the 10X Genomics 3' kit, and filtered_feature_bc_matrix is generated by cellranger. You can use Seurat to load and analysis the data.

For ATAC, what method are you using? It seems you are missing the matrix.mtx file. Since you have the fragment file, maybe you can generate the matrix by yourself. I think Signac can do this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

dbrg77 commented 2 years ago

Hi @jiangzh-coder

If you read the method section of GSE199994 from this publication, you will realise that they were not using the 10x scATAC kit. Instead, they used the 10x Multiome Kit that profiles ATAC and RNA from the same single cell.

You should use cellranger-arc for this type of data.

I hope this helps.

jiangzh-coder commented 2 years ago

Hi

Thanks lot!

However, when i tried another dataset, metadata is missing.

i got following errors for upstream analysis.

(my10x) @.***:/home/ubuntu/GSE158398/scATAC/sra# cellranger-atac count --id=SRR12695067 \

                                --reference=/ home/ ubuntu/ biosoftware/ refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \
                                --fastqs=/ home/ ubuntu/ GSE158398/ scATAC/ sra \      

error: Found argument 'home/' which wasn't expected, or isn't valid in this context

If you tried to supply home/ as a PATTERN use -- home/

USAGE: cellranger-atac count [FLAGS] [OPTIONS] --id --reference --fastqs ...

For more information try --help (my10x) @.***:/home/ubuntu/GSE158398/scATAC/sra# --sample=SRR12695067 \

                                --localcores=24 \
                                --localmem=96

--sample=SRR12695067: command not found

best wishes Jiang

At 2022-11-15 18:57:38, "Xi Chen" @.***> wrote:

Hi @jiangzh-coder

If you read the method section of GSE199994 from this publication, you will realise that they were not using the 10x scATAC kit. Instead, they used the 10x Multiome Kit that profiles ATAC and RNA from the same single cell.

You should use cellranger-arc for this type of data.

I hope this helps.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jiangzh-coder commented 2 years ago

Hi

It works for reading but fail for this dataset pepline.

cellranger-atac count --id=sample345 \ --reference=/home/ubuntu/biosoftware/refdata-cellranger-arc-GRCh38-2020-A-2.0.0 \ --fastqs=/home/ubuntu/GSE158398/scATAC/sra \ --sample=SRR12695067 \ --localcores=24 \ --localmem=96

[error] Pipestance failed. Error log at: sample345/SC_ATAC_COUNTER_CS/SC_ATAC_COUNTER/_BASIC_SC_ATAC_COUNTER/_ATAC_MATRIX_COMPUTER/MAKE_ATAC_SHARDS/fork0/chnk0-u809b7395e4/_errors

Log message: Unable to read barcode sequence for read ID SRR12695067.1 1 length=50: there was no I2 read FASTQ and we were unable to read a 16-base barcode from the FASTQ header. Make sure that the flow cell was demultiplexed correctly.

Waiting 6 seconds for UI to do final refresh. Pipestance failed. Use --noexit option to keep UI running after failure.

best

At 2022-11-15 18:57:38, "Xi Chen" @.***> wrote:

Hi @jiangzh-coder

If you read the method section of GSE199994 from this publication, you will realise that they were not using the 10x scATAC kit. Instead, they used the 10x Multiome Kit that profiles ATAC and RNA from the same single cell.

You should use cellranger-arc for this type of data.

I hope this helps.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jiangzh-coder commented 2 years ago

Can I analyze only the ATAC data from my single cell multiome experiment? – 10X Genomics https://kb.10xgenomics.com/hc/en-us/articles/360061165691-Can-I-analyze-only-the-ATAC-data-from-my-single-cell-multiome-experiment-

At 2022-11-15 18:57:38, "Xi Chen" @.***> wrote:

Hi @jiangzh-coder

If you read the method section of GSE199994 from this publication, you will realise that they were not using the 10x scATAC kit. Instead, they used the 10x Multiome Kit that profiles ATAC and RNA from the same single cell.

You should use cellranger-arc for this type of data.

I hope this helps.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>