AbateLab / SiC-seq

Scripts used in the 2017 SiC-seq paper
3 stars 3 forks source link

locating barcode and the sequence #1

Open jessieren opened 7 years ago

jessieren commented 7 years ago

Hi Ben,

This is Jessie Ren, a PhD student from USC. I was very excited when I read the SiC-seq paper. Congratulations!

I am interested in analyzing the single-cell samples in your paper. I am wondering where the 15 bp barcode locates in each read. Based on my understanding from the paper (Figure 2e), the barcode should be the sequence flanked by the constant sequence1="AAGCCAGCCCCGACACT" and constant sequence2="GCAGCTGGCGTAATAGCGAGTACAATCTGCTCTGATGCCGCATAG". If it is the case, I should first look for the two constant sequences in the reads. The barcode will be the sequence in between. Then the bacterial sequence is after the constant sequence2. Is my understanding correct?

In addition, have the adaptors and primers been trimmed from the reads?

Thank you very much for your help.

Best wishes, Jessie

bdemaree commented 7 years ago

Hi Jessie,

For sequencing, we use a custom i7 index primer which anneals directly upstream of the 15 bp barcode. So, each of the 15 bp index reads are cell barcodes for the corresponding R1 and R2 reads (the bacterial genomic DNA). You can access the raw FASTQ files (R1, R2, I1) at the NCBI SRA, run ID SRR5202186. Files deposited into the SRA did not have the Nextera adapter sequences (or other primers) trimmed.

Thanks,

Ben

jessieren commented 7 years ago

Hi Ben,

Thank you very much for the explanation. I really appreciate it.

I tried to download the R1, R2 and I1 files from NCBI using the command "fastq-dump --split-3 -A SRR5202186", but I got only one file SRR5202186.fastq. The same case was with the another artificial sample, SRR5202187. Is there anything special that I missed for downloading barcoded samples?

For the real metagenomic sample SRR5208456, using the same command, I got two files, SRR5208456_1.fastq and SRR5208456_2.fastq. The first one contains ~150 bp reads and the second one has 15bp sequences. Both files have the same number of sequences (~20 million). I think that looks correct :) Am I right?

Thank you.

Jessie

bdemaree commented 7 years ago

Hi Jessie,

Apologies, there seems to be an issue with the SRA upload, and I cannot download the index reads for the 10-cell control experiment either. I will contact NCBI to correct this issue, but in the meantime, here is a link to a Google Drive folder containing the raw R1 and RI files for this experiment:

https://drive.google.com/drive/folders/0B6b7shhcDLp-SzIxRkZTSUc4RkE?usp=sharing

The sequencing run was single-end, so there is no R2 file. You are correct that for the real metagenomic sample the 150 bp read is the genomic DNA, and the 15 bp read is the single-cell barcode (the same holds true in the 10-cell synthetic community). In all cases, the cluster IDs in the first line of each FASTQ entry should match up between the read and index files.

Thanks, Ben

jessieren commented 7 years ago

Hi Ben,

Thank you very much for the efforts. The link works well. I have downloaded the files from the link. :)

Best wishes, Jessie

harojm commented 6 years ago

Hi Ben, Congratulations for your excellent work. We have tried to download from the NCBI the index file ( I1 ) but we could not find it in the database. We are very interested in the Alteromonas genomes. Could you please provide us the index file for these genomes?

Thanks in advance, Kind regards, Jose M Haro-Moreno