Closed fujch7 closed 2 years ago
Thanks for getting in touch!
phyloFlash_compare.pl
script). If they are technical replicates, or simply different sequencing runs from the same library, then it probably should be OK to pool them. Pooling the libraries may also allow detection of lower-abundance taxa.Hi, I'm very excited to run this program successfully! But I am confused about the reads number summarized in the log file:
_[22:29:53] Total read segments processed: 326098486
[22:29:53] insert size median: 241
[22:29:53] insert size std deviation: 66
[22:29:53] Summarizing taxonomy from mapping hits to SILVA database
[22:30:00] done...
[22:30:01] Forward read segments mapping: 70117
[22:30:01] Reverse read segments mapping: 70306
[22:30:01] Reporting mapping statistics for paired end input
[22:30:01] **Total read pairs with at least one segment mapping: 49149**
[22:30:01] => **both segments mapping to same reference: 51169**
[22:30:01] => **both segments mapping to different references: 9552**
[22:30:01] **Read segments where next segment unmapped: 18981**
[22:30:01] mapping rate: 0.030%_
Why Total read pairs with at least one segment mapping is always less than both segments mapping to same reference? I don't quite understand the quantitative relationship between the content in bold font.
Yes, this doesn't seem right to me. Which version of phyloFlash are you running?
Could you please run phyloFlash on the test files (test_F.fq.gz
and test_R.fq.gz
) included with phyloFlash, and attach the log file of the run here? If you installed using Conda, then those two files should be located in the Conda environment folder under lib/phyloFlash/test_files/
.
The count of "Total read pairs with at least one segment mapping" is based on the read names in the Fastq file, whereas the other metrics are based on a running count while processing the whole file. In theory they should match up, and it also serves as a sanity check. Is it possible that the same read name may occur more than once in the file? Were the reads renamed in some way during initial QC or trimming, for example?
Hi, Thanks for your amazing tool. I have 2 questions. First, should the input reads be the raw data or clean data(which has been quality controlled by trimmomatic and fastuniq)? Second, in my situation, having 6 samples of metagenome data, should I run this tool separately, or merge 6 samples together and then run this tool?