Open desmodus1984 opened 2 years ago
Hi @desmodus1984,
Seems like the sequence identifiers (headers) do not start with '>', which is probably a must for SeqAn to parse and index the FASTA file properly. Can you try adding '>' to the beginning of sequence identifiers and rerunning Apollo again? I would also check the encoding of your text file and some unexpected hidden characters that you may have in your line endings, which may be messing up with your FASTA file.
You can potentially use seqtk seq
to convert your FASTA file in a way that Apollo requires. It would hopefully resolve the issues that you may experience regarding formatting and line endings.
Best,
Can Firtina
Hi Firtina,
I checked my files again and they seem to be fine: sequence identifiers (headers) start with '>' head reads1.fasta>V300066187L4C001R0010000000/1AATGTAAATACATTTTTGTATCCTACTGTTTATTGTACTCTTATTACAGGCATTTTCCACTTTGTTCTGCAGTCTGTATTTTAAAAAATGCTATATTATC>V300066187L4C001R0010000014/1TGAGAAAGGTTGTTTCCCCAGGTAGGAATTTTCCCCTGAAGTTAGGGAGGGGATAAAGCCCCTTAACTAAGTGCCAGGTGGGTAGTTAATCACTTTAACT>V300066187L4C001R0010000017/1CCTAGCCCCACACCAGACCCCCAGCCCAGAGTCCAGAGCTGGGAAAATAAGTTACTGTAACTTCTGGCTATAAAAACCAGCGGGAACTGTGGCTGACTGA>V300066187L4C001R0010000029/1AGGGAGCTTCAGGACAACATGAAACGAAGTAACATACGCATAATAGGGCTGCAAGAAGGACAAGAAGAACAGCAAGGATTAGAAAATCTATTTGAAGAAA>V300066187L4C001R0010000038/1CACAGTATTTAACATGAGAATTTTTCACGTGTCAGGATAGAAAAGTTTAAATCAGCTCAAGGTTGATGACGATATAGAGAAACAAGCACTATTCTTTTTA head reads2.fasta>V300066187L4C001R0010000000/2GTGTCAGATGTGTTATATAGCTTGATTTTAACCATTTAACCAATACATACATGAAGATATATACCCCAAATATATGCCATTTGTGTCAAGTATACCTGAA>V300066187L4C001R0010000014/2ATCTGTATTTATACCAATTGATTTTAATCCTGTCAATTTCTATCGCAAAGGTTAGGGCGTTTCTTATCTCCATTCCAGGGAGTAAAGATTATGTAGCTTA>V300066187L4C001R0010000017/2AAAGCTGCGCCCAAAACTCCCACCCGGCTAGACAGTTCAGTTCCTCTCCATATGTCACTGGATTTCCCCAAAGCCACTACCTGGTGCTGGAGCTCACCGG>V300066187L4C001R0010000029/2GTTTCTGTTGAGAAATCGTTTGATAATCTGATGGGGGATCCTTTGTAGGTAACTCTCTGTTTCTCTCTTGCTGCCTTTAAGATTCTCTCTTTGTCTTGAA>V300066187L4C001R0010000038/2TCTCACACTGATATTTTTTTCTCTCTCTCCCCTTCTCTCTCTCTCTAAAATCAATAAACATACCTTTGGGTGAGGATAAACAGAATAGTGCTTGTTTCTC
And I think the encoding is fine.
I converted my fastq to fasta using bioawk
reads1.fasta: text/plain; charset=us-asciireads_1.fq: text/plain; charset=us-asciireads2.fasta: text/plain; charset=us-asciireads_2.fq: text/plain; charset=us-ascii
Any way to check for those "hidden characters". I have no idea how to do that. I do not expect bioawk to add hidden characters.
Best regards;
Juan Pablo Aguilar Cabezas
Ecology and Evolutionary Biology Ph.D. Candidate
Department of Biological Sciences
Ohio University, Athens OH
From: Can Firtina @.> Sent: Monday, January 3, 2022 10:11 AM To: CMU-SAFARI/Apollo @.> Cc: Aguilar Cabezas, Juan Pablo @.>; Mention @.> Subject: Re: [CMU-SAFARI/Apollo] CONSISTENT ERROR -FastaIndex: Record has inconsistent line lengths or line endings (Issue #8)
NOTICE: This message was sent from outside Ohio University. Please use caution when clicking links or opening attachments in this message.
Seems like the sequence identifiers (headers) do not start with '>', which is probably a must for SeqAn to parse and index the FASTA file properly. Can you try adding '>' to the beginning of sequence identifiers and rerunning Apollo again? I would also check the encoding of your text file and some unexpected hidden characters that you may have in your line endings, which may be messing up with your FASTA file.
You can potentially use seqtk seq to convert your FASTA file in a way that Apollo requires. It would hopefully resolve the issues that you may experience regarding formatting and line endings.
Best,
Can Firtina
— Reply to this email directly, view it on GitHubhttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCMU-SAFARI%2FApollo%2Fissues%2F8%23issuecomment-1004157543&data=04%7C01%7Cja569116%40ohio.edu%7C795a08dce6c84417d64a08d9cecb4cd1%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C637768194832612194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sp3v5wUyswcSCjsqgxseRUoFr7GEFnE5lQsbWjcC9Fo%3D&reserved=0, or unsubscribehttps://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAJWD2VNFD5VNCFDQYIEEEPDUUG4BRANCNFSM5LEPBGAQ&data=04%7C01%7Cja569116%40ohio.edu%7C795a08dce6c84417d64a08d9cecb4cd1%7Cf3308007477c4a70888934611817c55a%7C0%7C0%7C637768194832612194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zirnFb7kwDADHqJ%2B3iVkdDnU%2Fw7U8CKnrQFaNRLqCWA%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>
Hi. I built an assembly and I am trying to polish it with apollo. I installed it as told, and followed all the steps. I converted the fastq files into fasta one-liners head reads2.fasta
I did convert the sam to bam and sorted it and indexed it
/users/PHS0338/jpac1984/appz/bwa-mem2-2.2.1_x64-linux/bwa-mem2 mem -t 48 Hapo -R '@RG\tID:PA113-1\tSM:bar\tPL:DNBSEQ' \ /fs/scratch/PHS0338/BGI-reads/reads_1.fq > PA113-1.sam /fs/scratch/PHS0338/appz/samtools-1.14/samtools view -hb -@ 48 PA113-1.sam > PA113-1.bam /fs/scratch/PHS0338/appz/samtools-1.14/samtools view -h -@ 48 -F4 PA113-1.bam | /fs/scratch/PHS0338/appz/samtools-1.14/samtools sort -@ 48 -m 3G -O bam -o PA113-1.sorted.bam /fs/scratch/PHS0338/appz/samtools-1.14/samtools index -@ 12 PA113-1.sorted.bam
And I get the same SeqAn error that I do not know how to fix it.
The log: Assembly: /users/PHS0338/jpac1984/data/myse-hapog.fasta Pair of a set of reads and their alignments: /fs/scratch/PHS0338/BGI-reads/reads1.fasta, /fs/scratch/PHS0338/appz/sam-bams/PA113-1.sorted.bam /fs/scratch/PHS0338/BGI-reads/reads2.fasta, /fs/scratch/PHS0338/appz/sam-bams/PA113-2.sorted.bam Output file: myse-polished.fasta Maximum consecutive insertions: 3 Maximum consecutive deletions: 10 Transition probability to match states: 0.85 Transition probability to insertion states: 0.1 Overall deletion transition probabilities from a state: 0.05 Deletion transition factor: 2.5 Emission probability of a matching character: 0.97 Emission probability of a substitution (i.e., mismatch) character: 0.01 Emission probability of an inserted character: 0.333333 Filter size: 100 Viterbi filter size: 5 Viterbi batch size: 5000 Read chunking size (0 for original length): 1000 Max thread: 48 terminate called after throwing an instance of 'seqan::ParseError' what(): FastaIndex: Record has inconsistent line lengths or line endings /var/spool/slurmd/job8594593/slurm_script: line 10: 45058 Aborted (core dumped) bin/apollo -a /users/PHS0338/jpac1984/data/myse-hapog.fasta -r /fs/scratch/PHS0338/BGI-reads/reads1.fasta -r /fs/scratch/PHS0338/BGI-reads/reads2.fasta -m /fs/scratch/PHS0338/appz/sam-bams/PA113-1.sorted.bam -m /fs/scratch/PHS0338/appz/sam-bams/PA113-2.sorted.bam -t 48 -o myse-polished.fasta
Any idea of why it is failing all the time? I have all the input files are required and it fails.