JunyueC / sci-CAR_analysis

Scripts for processing sci-RNA-seq/sci-ATAC-seq/sci-CAR reads processing
16 stars 4 forks source link

Split barcodes in sci-ATAC #2

Open marvinquiet opened 5 years ago

marvinquiet commented 5 years ago

It's really a nice job profiling these two kinds of data together!

However, I encountered some problems related to convert the SRA file to the fastq file in sci-CAR ATAC-seq. I used the fastq-dump tool with .sra file, and used the "fastq-dump --split-files --origfmt --defline-seq '$sg:$sn'" command. The result in the fastq file is something as below:

TCCGGAGACCAGATACGG:1
NGTAGGAAGTTTTTTCATAGGAGGTGTATGAGTTGGTCGTAGCGGAATCGG
+1
#AAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

There is only an 18bp barcodes in my first line of fastq. However, according to your scripts, it should be something like P5.P7+N5.N7. Could you please help me with this problem? Thank you so much!

Wenjing

marvinquiet commented 5 years ago

Hi, Mr. Cao,

Is there a possibility that the latter part of the barcodes is missing? Could you please help check it?

Many thanks, Wenjing

JunyueC commented 5 years ago

Hi Wenjing,

Thanks for the note - we also noticed that and have sent an email to GEO to correct that. Will keep you updated when it is fixed.

Jun

On Oct 15, 2018, at 11:45 AM, MarvinQuiet notifications@github.com wrote:

Hi, Mr. Cao,

Is there a possibility that the latter part of the barcodes is missing? Could you please help check it?

Many thanks, Wenjing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JunyueC/sci-CAR_analysis/issues/2#issuecomment-429969505, or mute the thread https://github.com/notifications/unsubscribe-auth/AOQeuKFqW7CyUw2POw1GQ-JsNXAyFDhRks5ulNewgaJpZM4XaSBQ.

JunyueC commented 5 years ago

SRA have reprocessed the runs. Here is the message they sent:

These Runs have been reloaded to include the barcodes in the read name, the user will need to update their locally cached files to see the changes.

Let me know if you have any questions.

Jun

On Oct 15, 2018, at 9:53 PM, Junyue Cao junyuecao.1988@gmail.com wrote:

Hi Wenjing,

Thanks for the note - we also noticed that and have sent an email to GEO to correct that. Will keep you updated when it is fixed.

Jun

On Oct 15, 2018, at 11:45 AM, MarvinQuiet <notifications@github.com mailto:notifications@github.com> wrote:

Hi, Mr. Cao,

Is there a possibility that the latter part of the barcodes is missing? Could you please help check it?

Many thanks, Wenjing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JunyueC/sci-CAR_analysis/issues/2#issuecomment-429969505, or mute the thread https://github.com/notifications/unsubscribe-auth/AOQeuKFqW7CyUw2POw1GQ-JsNXAyFDhRks5ulNewgaJpZM4XaSBQ.

marvinquiet commented 5 years ago

Hi Jun,

Yes, it works! Thank you for letting me know, I really appreciate it!

And may I ask you another question about fixing barcodes? From your script, I could see, only "P7.P5" participate in the barcode fixing. (Correct me if I am wrong..) However, from the supplementary material, it said Tn5 barcodes will be fixed. I am getting a little bit confused.

Also, I am not sure whether putting a dot between P7 and P5 will influence the edit distance if considering insertion and deletion.

Looking forward to hearing from you!

Best, Wenjing

yqshao17 commented 5 years ago

Hi, Junyue,

I encountered the similar problem with Wenjing. There is no barcode information in fastq files downloaded from SRR7521175, SRR7521176 and SRR7521177 (mouse kidney sci-CAR ATAC-seq, GEO assession: GSM3271045).

The script is "fastq-dump --split-files -F SRR7521175" and the result is:

@SRR7521175.1 length=75
GATCGNCTGTAAATGGTTTCTAATCTCTACACAGAGATGGGAGAGCACTGAGTGTTAAAATGGAGAATTTCTATA
+SRR7521175.1 length=75
6AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE6EEEEEEEAEEAEEEAEEE

Could you please check it?

Thanks, Yanqiu

JunyueC commented 5 years ago

Hi Yanqiu,

Sorry about this! Contacted with GEO but it seems they do not correct all files. For the original ATAC-seq fastq file, you can also download them from our server here: https://shendure-web.gs.washington.edu/content/members/cao1025/public/nobackup/sciCAR_ATAC_fastq/ https://shendure-web.gs.washington.edu/content/members/cao1025/public/nobackup/sciCAR_ATAC_fastq/

Jun

On Jun 4, 2019, at 2:14 AM, yqshao17 notifications@github.com wrote:

Hi, Junyue,

I encountered the similar problem with Wenjing. There is no barcode information in fastq files downloaded from SRR7521175, SRR7521176 and SRR7521177 (mouse kidney sci-CAR ATAC-seq, GEO assession: GSM3271045).

The script is "fastq-dump --split-files -F SRR7521175" and the result is:

@SRR7521175.1 length=75 GATCGNCTGTAAATGGTTTCTAATCTCTACACAGAGATGGGAGAGCACTGAGTGTTAAAATGGAGAATTTCTATA +SRR7521175.1 length=75 6AAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE6EEEEEEEAEEAEEEAEEE Could you please check it?

Thanks, Yanqiu

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JunyueC/sci-CAR_analysis/issues/2?email_source=notifications&email_token=ADSB5OFKUGYZUERTOSXW3XLPYYW55A5CNFSM4F3JEBIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW36HWQ#issuecomment-498590682, or mute the thread https://github.com/notifications/unsubscribe-auth/ADSB5OCMWY6ZH7ADDH7J7J3PYYW55ANCNFSM4F3JEBIA.

yqshao17 commented 5 years ago

Thank you so much!!!