Closed ls2017 closed 8 years ago
My question: Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?
Yes. This is a restriction in DAZZ_DB/fasta2DB. The header must match >movie/well/blah
plus comments if any. All reads from the same movie should be together in the file. well
is an integer. blah
is ignored.
@jingqinwu you will need to download the bax.h5 files and use pls2fasta
to convert to proper fasta. SRA's fasta output does not encode proper information for assembly (yet).
Depending on how the files were uploaded they may or may not contain the data needed to correctly format them. Some useful reading: http://microbe.net/2015/01/20/submit-data-to-ncbis-short-read-archive/ http://seqanswers.com/forums/showthread.php?t=56466. If the data isn't in the SRA I would suggest contacting the authors of the study.
@pb-jchin @rhallPB Many thanks for your suggestions.
See related issue here: https://github.com/pb-jlandolin/PacbioToSRA/issues/2
If they were uploaded by PacBio, they should have links to the original bax.h5 files. You can click on the SRR id, then click on the "Download" tab, and download the original bax.h5 files instead of the .sra files:
I have a downloaded dataset from SRA, and converted it to *.fastq, sth. like this:
@SRR1168519.1 length=302 ATTTTTGTCTGTCCGATTCTGATAGCAGGC GCATATCAGATGAATCTGATGAGTCAACACTGGTTGGTTCGTTGCTCAGTAGTTATGTTCGTGTGGAGCGTCGTATTGGTATCGAGTCTGATTGTCAGTCATCGATGGTCATTAGTCACGTCCTTCCAGTAGTTCGTATCAACATGCTTCACTATTCTTGTTGTTGTAGATGTTATTCGTATTAGTGTGAGTGTCAGTAGTTACGCGTACAGTATCGGGATTTCGTAGCAGCGCGCGGCGTTGCGGAGTCAAGATTCATGGCTGGACTACGG +SRR1168519.1 length=302 !"!!!"#$"##!!!"!!"!"#""""#$#"!"!""!!!!!""%"""!"!"#""!#"!!!"!#"!#!!!"!!!"""!!!!"""#!!"#"!"!""!"!!!!""#!!!""!!!"!#!"###"#""!"!!!##!#!#!"!"""!"$$!!"#"$""#"!!"!!#"!!#!!!"!"""!!""%#"$#"$"#"!!!"!!!!!"!!!"!"!"!$#%&%%$"""""""!#"!"!!""##"$!!!!!!!$$!!!!!#!!"!!!!%!"$"!!"""!!!!!!!"!!!!!!$$#"!"!!!"!$$#"!$!!!""!"""
After using Falcon-formatter for format conversion, it does NOT work in Falcon.
And it looks like that the fasta files require strict formatting with the information of movie, time of run start, SMRT barcode, etc. and should look like this (copied from ecoli example):
My question: Can I make up some dummy variables equivalent for ">m140913_050931_42139_c100713652400000001823152404301535_s1_p0/9/1607_26058" to make Falcon work properly?
Or is there another way to dump*.sra file I downloaded to make it work properly in Falcon?