Closed Pkaps25 closed 3 years ago
@antonisdim I have resolved this issue. No reads were aligning to taxa other than Dark Matter causing empty ts/tv files to be generated. I am working with fasta files; does Haystac use quality scores for anything other than the bowtie alignment? I am wondering if it is possible to modify the code minimally to support fasta.
Hello Peter,
I hope you are doing great and apologies for the delayed response !
Indeed we only currently support fastq
files. I do not think it would be hard to integrate some support for fasta
files in a future version of haystac
. Of course I'll keep you updated.
Thank you for your patience !
Best, Antony
Hi Antony,
Thank you for the response. Does Haystac use the quality scores for abundance calculation or dirichlet read assignment. Based on the paper and code I am leaning towards no, but would like to confirm with you.
Thanks again!
Hello Peter,
No after the first filtering alignment with bowtie2
base quality scores are not considered. So the individual metagenomic alignments (with bowtie2
) and the dirichlet read assignment do not use the base quality info, but instead they focus on the edit distance of the reads.
Hope this helps and please let me know if you have any other questions !
Best, Antony
Hello
As far as I could tell Haystac does not accept fasta files as sample inputs, only fastq. I have some fasta files I'd like to analyse, so I used BBMap
reformat.sh
andseqtk seq
to convert the fasta files to fastqs with dummy quality score values of 40. The sequence headers are of the form @NC_XXXX-seqN/1, where N is the read number and XXXX are numbers belonging to an NCBI taxon. I have successfully built samples with these fastqs, but analysing using--mode reads
results in the errors in the attached log files. It appears that the ts_tv count files are all empty. Do you have any guidance as to how to troubleshoot?Thank you ts_tv_log_2.txt ts_tv_log.txt